Hash, Bang, Wallop.

Feb 11 2011, 8:42 (+0000).

This week in web architecture, JavaScript routing hit the fan. Or at least, it did in vast majority of situations where the fan and all of its dependencies loaded over a reliable internet connection. A few were staring at a blank screen, or still trying to figure out how to navigate the new Gizmodo.

Tedious disclaimer because I-work-for-Twitter-but-don’t-represent-the-views-of-anyone-but-myself-nor-and-especially-not-of-our-magnificent-webclient-team-who-are-actually-responsible-for-this-stuff-on-twitter-dot-com-and-may-or-may-not-have-other-priorities-at-this-time-of-night-let-alone-during-the-day.

Let’s get something very clear: Hash-bang URLs are shit. They’re ugly, brittle and a furious hack in the absence of anything else. This week, friend and former Yahoo co-worker of mine Mike Davies wrote up many of the problems with #! URLs, and off the back of Gawker’s massive (and already controversial) redesign it caught on. Attention has been turned on Gawker—who have created a JavaScript application to render each of their sites, with persistent, updating side navigation and a separate content pane, and Twitter which adopted the practice on New Twitter which launched back in October. Facebook do it too.

There are better placed people than me to comment on the Twitter aspect of this, principally anyone who actually worked on the new Twitter site. I did not (in fact, I joined the company after.) In general, I agree with the sentiments and wish for the pattern to go away. (It turns out that it was me who wrote “if site content doesn’t load through curl it’s broken”, and I’ll stand by that.) However, it’s not quite that clear cut, and what I want to document now are faults in the criticisms which have been published recently. Misleading errors and tangents in vitriolic argument really don’t help anything, and distract us from making a clearly robust case and documenting the facts of a risky methodology.

Update: And, lo, better placed co-worker and Other Ben, Ben Cherry has written up #! from a Twitter perspective.

## The what

The #! in a URL is a client-side content routing pattern codified by Google. Before this, a small number of sites enhanced with Ajax were either using a solitary # to differentiate content, or nothing at all. The original # hack was developed so that the back button would work in browsers, and so that media playback not be interrupted whilst moving between articles (e.g. on Hype Machine.) It also allows separate content to be bookmarked. Since Google doesn’t read content rendered with JavaScript, they enhanced the # pattern with another character ! so they could differentiate it and handle these sites.

The reason # or #! is in the URL at all is twofold. Firstly, it’s because the site is doing client-side routing: Rather than content being resolved on the server like usual, code in the browser is interpreting the path after the #!, building a custom data query for an API and then rendering the returned data into an existing page.

Secondly, it’s there because it has to be. At the time of writing, there is no universal browser support for altering the actual path of the browser’s displayed URL without causing the browser to also reload the page. When your client is resolving the content instead, that’s not what you want to happen. So instead you listen to the hashChange event in the browser, and work entirely within the URL fragment that is never sent to the server.

Twitter and Gawker redirect requests to old twitter.com/benward/ paths to new-style twitter.com/#!/benward URLs on the server-side, because you can do the redirect very quickly, and maintain a single set of active URLs for all content on the site; always based in the root of the domain.

In the not-too-distant-future, it may become feasible to use new, HTML5-era APIs for pushState which do allow the entire URL path to be rewritten without a #, and doing so will allow for client-side routing without the ugly mark. # will be the fallback for older browsers. Github are doing this already in their repository browser. A different Ben at Twitter wrote about this last July: For a quick write up and demo, see Sane HTML5 history management by Ben Cherry.

The ‘why’

Something that Mike wasn’t able to pontificate on in his piece was why a site chooses to to this. Actually, he did suggest that developers were doing this “because it’s cool”. That’s a quite unhelpful line of argument.

The reasons sites are using client-side routing is for performance: At the simplest, it’s about not reloaded an entire page when you only need to reload a small piece of content within it. Twitter in particular is loading lots, and lots of small pieces of content, changing views rapidly and continuously. Twitter users navigate between posts, pulling extra content related to each Tweet, user profiles, searches and so forth. Routing on the client allows all of these small requests to happen without wiping out the UI, all whilst simultaneously pulling in new content to the main Twitter timeline. In the vast majority of cases, this makes the UI more responsive.

Google have studied and documented that even 200 millisecond differences in performance affects a user’s long term satisfaction and engagement with a site.

Crawling

In Google’s ‘spec’, there’s something very strange. They document that when they crawl the content, rather than just—say—removing the #! and requesting the resource directly using the same path, they take the path and throw that back as an _escaped_fragment_ parameter. That is pretty ugly. Mike goes off on a tangent and talks about equivalence with pre-URL-rewriting formats of index.php?content_id=1234 query strings and the like. This is a red herring.

That uglier URL actually returns the content of the article. So this is the canonical reference to this piece of content. This is the content that Google indexes. (This is also the same with Twitter’s hash-bang URLs.)

The implication is that this is something regular people will see, and that this is a big deal. It isn’t. They won’t. This is used only by crawlers to pull the cached, static version of a piece of content, nothing more. In fact, despite the claim that Gawker’s articles have disappeared from Google, you can search for the Lifehacker announcement post on Google and see that it renders the #! form URL under the result.

It’s not ‘good’, but it is documented, and it doesn’t affect regular users. It also doesn’t affect the art of URL design: You still need to create logical paths with which to query content and hint to users about content. The presence of a #! in the middle is not a substantial difference in that resultant user experience. It is very important not to overstate the _escaped_fragment_ parameter as an important issue. It’s shit, that’s all.

I’ve lost a reference here, sorry. Elsewhere, I read another criticism of #! URLs: That they result in you browsing to confusing locations like http://example.com/dog#!/cat. This is another troublesome line of argument because whilst a badly implemented client might fuck that up, none of the examples cited do. Twitter and Gawker both handle an initial path rewrite on the server side, ensuring that the JavaScript routers always operate from the root of the domain. Let’s not waste paragraphs making hot air of problems that don’t actually happen.

Summary

To end: The ‘hash-bang’ problem is a few separate problems with different solutions:

The Ugly URLs: This is a temporary problem caused by lack of widespread support for pushState (and a bug in Webkit that meant you can’t update the URL whilst there’s also data transfer going on.) Until a majority number of a site’s user’s have a capable browser, it will be necessary to do the # redirect server-side, else the initial page load (loading a full page, redirecting on the client, loading another page) will be far too slow and hurt perceived performance (which gets us right back to Google’s performance/engagement study, and the reason for doing any of this in the first place.) And although ugly, Google handles them without further disruption to end users (although, Bing doesn’t yet.)
The robustness faults caused by doing routing on the client side, and not providing server-side fallbacks.

The counter arguments to either of these are still numerous, and very well documented all over the web at this point. But you are not arguing against some group of web developers trying to be ‘cool’. You’re arguing against putting content in front of users faster. For some sites that matters a lot (Twitter.) For some sites, it’s fair to argue that their implementation is less appropriate (Gawker.) There are also very fair arguments about granular implementation of this sort of thing (perhaps parts of a site being routed client side, but maintaining crawlable URLs for the permalinks of content.)

I’ll reiterate: I do hope that #! is a short-lived pattern on the web. I hope to see the bugs and browser-share limitations of pushState overcome quickly. I hope to see efforts in frameworks that make it easier to share rendering code between server side and client side. I hope that all the talk over the past few days results in a definitive, facts and measures based reference to consider carefully before adopting client-side routing in other sites, and I hope that this little piece of documentation helps to focus only on problems that are actually exhibited in sites, rather than strict URL theory.

For what it’s worth, my personal overriding argument against #! remains this, based on my experience visiting Vancouver last year: The number of users who choose to disable JavaScript is small and of questionable relevancy; JavaScript errors breaking execution can be almost entirely avoided with good development practices. However, there are people who find themselves on shitty hotel wifi, or tethering over a 3G network, or just daring to live somewhere in the world where packet loss is a problem. It will always be different people, and it can affect us all, even briefly.

We’re all building tools around information and real time communication, and it matters that it works as often as possible.

Changelog:

2016-11-06: Link to Archive.org for dead YDN link: developer.yahoo.com/blogs/ydn/posts/2010/10/how-many-users-have-javascript-disabled/#comment-17071
2011-02-11: Added a link to Ben Cherry’s Thoughts on the Hashbang.

Ben Ward

Hash, Bang, Wallop.

The ‘why’

Crawling

Navigation

Summary

Links