Hash, Bang, Wallop.
Tedious disclaimer because I-work-for-Twitter-but-don't-represent-the-views-of-anyone-but-myself-nor-and-especially-not-of-our-magnificent-webclient-team-who-are-actually-responsible-for-this-stuff-on-twitter-dot-com-and-may-or-may-not-have-other-priorities-at-this-time-of-night-let-alone-during-the-day.
Let's get something very clear: Hash-bang URLs are shit. They're ugly, brittle and a furious hack in the absence of anything else. This week, friend and former Yahoo co-worker of mine Mike Davies wrote up many of the problems with
There are better placed people than me to comment on the Twitter aspect of this, principally anyone who actually worked on the new Twitter site. I did not (in fact, I joined the company after.) In general, I agree with the sentiments and wish for the pattern to go away. (It turns out that it was me who wrote “if site content doesn’t load through curl it’s broken”, and I'll stand by that.) However, it's not quite that clear cut, and what I want to document now are faults in the criticisms which have been published recently. Misleading errors and tangents in vitriolic argument really don't help anything, and distract us from making a clearly robust case and documenting the facts of a risky methodology.
#! in a URL is a client-side content routing pattern codified by Google. Before this, a small number of sites enhanced with Ajax were either using a solitary
# to differentiate content, or nothing at all. The original
# pattern with another character
! so they could differentiate it and handle these sites.
#! is in the URL at all is twofold. Firstly, it's because the site is doing client-side routing: Rather than content being resolved on the server like usual, code in the browser is interpreting the path after the
#!, building a custom data query for an API and then rendering the returned data into an existing page.
Secondly, it's there because it has to be. At the time of writing, there is no universal browser support for altering the actual path of the browser's displayed URL without causing the browser to also reload the page. When your client is resolving the content instead, that's not what you want to happen. So instead you listen to the
hashChange event in the browser, and work entirely within the URL fragment that is never sent to the server.
Twitter and Gawker redirect requests to old
twitter.com/benward/ paths to new-style
twitter.com/#!/benward URLs on the server-side, because you can do the redirect very quickly, and maintain a single set of active URLs for all content on the site; always based in the root of the domain.
In the not-too-distant-future, it may become feasible to use new, HTML5-era APIs for
pushState which do allow the entire URL path to be rewritten without a
#, and doing so will allow for client-side routing without the ugly mark.
# will be the fallback for older browsers. Github are doing this already in their repository browser. A different Ben at Twitter wrote about this last July: For a quick write up and demo, see Sane HTML5 history management by Ben Cherry.
Something that Mike wasn't able to pontificate on in his piece was why a site chooses to to this. Actually, he did suggest that developers were doing this “because it’s cool”. That's a quite unhelpful line of argument.
The reasons sites are using client-side routing is for performance: At the simplest, it's about not reloaded an entire page when you only need to reload a small piece of content within it. Twitter in particular is loading lots, and lots of small pieces of content, changing views rapidly and continuously. Twitter users navigate between posts, pulling extra content related to each Tweet, user profiles, searches and so forth. Routing on the client allows all of these small requests to happen without wiping out the UI, all whilst simultaneously pulling in new content to the main Twitter timeline. In the vast majority of cases, this makes the UI more responsive.
Google have studied and documented that even 200 millisecond differences in performance affects a user's long term satisfaction and engagement with a site.
In Google's ‘spec’, there's something very strange. They document that when they crawl the content, rather than just—say—removing the
#! and requesting the resource directly using the same path, they take the path and throw that back as an
_escaped_fragment_ parameter. That is pretty ugly. Mike goes off on a tangent and talks about equivalence with pre-URL-rewriting formats of
index.php?content_id=1234 query strings and the like. This is a red herring.
That uglier URL actually returns the content of the article. So this is the canonical reference to this piece of content. This is the content that Google indexes. (This is also the same with Twitter’s hash-bang URLs.)
The implication is that this is something regular people will see, and that this is a big deal. It isn't. They won't. This is used only by crawlers to pull the cached, static version of a piece of content, nothing more. In fact, despite the claim that Gawker's articles have disappeared from Google, you can search for the Lifehacker announcement post on Google and see that it renders the
#! form URL under the result.
It's not ‘good’, but it is documented, and it doesn't affect regular users. It also doesn't affect the art of URL design: You still need to create logical paths with which to query content and hint to users about content. The presence of a
#! in the middle is not a substantial difference in that resultant user experience. It is very important not to overstate the
_escaped_fragment_ parameter as an important issue. It's shit, that's all.
I've lost a reference here, sorry. Elsewhere, I read another criticism of
#! URLs: That they result in you browsing to confusing locations like
To end: The ‘hash-bang’ problem is a few separate problems with different solutions:
- The Ugly URLs: This is a temporary problem caused by lack of widespread support for
pushState(and a bug in Webkit that meant you can't update the URL whilst there's also data transfer going on.) Until a majority number of a site's user's have a capable browser, it will be necessary to do the # redirect server-side, else the initial page load (loading a full page, redirecting on the client, loading another page) will be far too slow and hurt perceived performance (which gets us right back to Google's performance/engagement study, and the reason for doing any of this in the first place.) And although ugly, Google handles them without further disruption to end users (although, Bing doesn't yet.)
- The robustness faults caused by doing routing on the client side, and not providing server-side fallbacks.
The counter arguments to either of these are still numerous, and very well documented all over the web at this point. But you are not arguing against some group of web developers trying to be ‘cool’. You're arguing against putting content in front of users faster. For some sites that matters a lot (Twitter.) For some sites, it's fair to argue that their implementation is less appropriate (Gawker.) There are also very fair arguments about granular implementation of this sort of thing (perhaps parts of a site being routed client side, but maintaining crawlable URLs for the permalinks of content.)
I'll reiterate: I do hope that
#! is a short-lived pattern on the web. I hope to see the bugs and browser-share limitations of
pushState overcome quickly. I hope to see efforts in frameworks that make it easier to share rendering code between server side and client side. I hope that all the talk over the past few days results in a definitive, facts and measures based reference to consider carefully before adopting client-side routing in other sites, and I hope that this little piece of documentation helps to focus only on problems that are actually exhibited in sites, rather than strict URL theory.
For what it's worth, my personal overriding argument against
We're all building tools around information and real time communication, and it matters that it works as often as possible.