Ben Ward

FAO: RDF

. Updated: .

In my BarCamp round-up I commented on RDF getting a strong showing in presentations. It caught my eye and I came out feeling optimistic about the future of rich data on the web. Don’t misinterpret that sentence as RDF enthusiasm yet, though.

The thing about RDF is that no-one has yet demonstrated any real-world reason to care about it. It fascinates academics who would love — just for the sake of it — to model the entire universe in triples but in the real world of web browsers the value has never really been promoted.

You may wish to watch Ian Forresters recording of the RDF vs. microformats discussion. It’s all very tongue in cheek of course, but still exposes some big misconceptions the RDF folks have which are going to hold them back.

In a general sense, I think some in the world of RDF are bitter about microformats. RDF has been in development for so many years and no-one of importance in the consumer world has cared. An upstart technology; hacky, rough around the edges and full of self-proclaimed imperfection bursts onto the scene and in 18 months makes data sexy. Oh, and of course microformats don’t require XML.

Here’s the thing, if you’re passionate about RDF for whatever reason and think microformats has stolen some thunder that you deserved, you need to get over this. Right now. Thank you. Now let’s continue.

As someone who does enjoy considering academic problems from time to time, I’m not going to dismiss the potential usefulness of RDF. Of course the open ended nature of it could result in some ingenious mash-ups — given enough real world data. But RDF does have big problems that need to be overcome.

The biggest technical barrier for RDF has actually been solved; publishing no-longer requires XML nor hidden duplication of the data that’s already in your page. The microformats mantra of ‘publish once’ could work, thanks to a technique called eRDF. With technicalities hurdled, what remains is primarily attitude.

First up, XML. While eRDF makes it optional and allows the same HTMLTidy-based parsing techniques as microformats use, it remains that advocates of RDF are somewhat inclined to be advocates of XML, XHTML and (worst) XHTML2 as well. The XML web hasn’t happened. It may never happen. Ian Forrester’s comment in the linked presentation that ‘we need XHTML’ is both discouraging to publishers and misleading. You don’t need it at all. Publishing will always be more important. Publishers find HTML easier to publish than valid, well formed XHTML. If you try and pull XHTML evangelism into RDF evangelism, publishers are going to dismiss both in the same breath. HTML is alive and kicking, and if you want to use our data on the web you’re going to have to parse HTML to get it.

Then there’s the ‘relationship’ with microformats. Standing up and pushing an agenda of ‘microformats aren’t powerful enough, RDF is better’ is strikingly parallel to ‘HTML isn’t powerful enough, use XML’. You know how that argument ends — and it has ended. Microformats do their job very well. They know and are designed for their limitations, they’re publisher-centric and the tools that have been written to harness them are user-centric. That’s what the web is all about and microformats has solved the problem of publishing contact and event information.

Yes, microformats are harder to parse than XML but that’s because the people who write parsers are more capable of handling the complexity than publishers are. The philosophy is that parsers only have to be written once, while the format will be published over and over. Any extra hours taken to write a more liberal parser are saved many times over by the thousands of people who are going to publish the data.

Additionally, can we please nip this ‘microformats have better branding; it’s all marketing’ attitude right now? It’s true that microformats have a fantastic website thanks to some very talented people. But the suggestion that a hip logo and high profile evangelists is enough to push adoption of new technology to real people without it being genuinely useful as well is utterly laughable, verging on out-right offensive to those who worked on developing it.

If microformats didn’t have the balance of usefulness and ease-of-use right then no bugger would use them. Simple. The evangelists would be dismissed as nerds, the logo stolen for some other purpose and the web would carry on regardless. Summoning Jon Hicks to our — Dan Cederholm produced — green-squared alter for the ritual sacrifice in fire of all the fluffy animals on earth would not rescue a broken technology. Microformats are not broken; the limits are designed.

So lets look ahead. The RDF community is trying to get established in the same world as microformats with eRDF and frankly, good luck to them. The more rich data published on the web the better. The microformats process is not for every application. The process depends on existing real-world publishing; it is designed to rule out the ‘invention’ of new formats. RDF positively embraces people creating their own custom formats on a whim, with an attitude of ‘sink or swim’ instead.

But there are two things. If you’re working with RDF, show me something new. Don’t just rehash vcard and icalendar into your own format; I’m not interested. I know it makes for a clever demo, but it’s a solved problem. People have learned how to do it with microformats and no-one is going to switch to a different syntax just for the sake of it — especially not fellow geeks who know only too well you could just transform the microformat into RDF when you need to.

New formats that solve new and different problems will attract attention and support though. At the end of the day, no-one cares what the name of the technology is that solves our problems. If it works, is easy enough to publish and provides useful enhancement to people’s experience of the web then people will use it.

Solving the same problems twice won’t get that response. In fact, I suspect if someone visits ‘eRDF.org’ and just sees ‘how to write microformats with a different syntax’ then they’ll move along quick. The RDF versions of vcard and icalendar need to be a the very bottom of your examples list, if you even publish them at all.

And finally, you’ve got to make the real-world, end-user benefits available at the same time. Standing up and saying ‘we can do some really vague, open ended stuff with this data’ won’t attract anyone.

At the very least you need practical demos that people can download, install and carry with them. Also, web pipes ala Technorati’s contact and event converters have been a huge success. Remember, there’s no native microformats UI in any web browser, but people can and do use them right now. Just with the addition of a hyperlink in a page microformats become a technology of ‘now’ and not ‘the future’. It’s proved the most fantastic interim means of bringing microformat functionality to all users, whilst the browser manufacturers build their native full-scale UI into upcoming releases.

Tom Morris ‘gets it’, I think. He’s launching a new effort called Get Semantic which aims to organise the public face of RDF and could well provide a vehicle for much of what I’ve written above to happen. I think it will be worth keeping an eye on.

Penultimately, I want to pre-empt a possible comment. I haven’t mentioned RDFa at all in this lengthy piece. This is because RDFa is a rubbish idea.

RDF has gotten a lot of deserved criticism over the years for being too conceptual and focusing only on solving imaginary problems that didn’t concern anyone outside academic communities. As mash-up culture promotes the publishing of interoperable data, the number of real-world problems that interoperable data can solve has increased. Microformats is capable of solving some and I think RDF is capable of solving some others. I just hope that the right kind of people get involved to try and make it happen.

Comments

Previously, I hosted responses and commentary from readers directly on this site, but have decided not to any more. All previous comments and pingbacks are included here, but to post further responses, please refer me to a post on your own blog or other network. See instructions and recommendations of ways to do this.

  1. There aren’t any realworld RDF applications that I am aware of, or that anyone I’ve spoken to who does RDF/Semantic Web related research is aware of either.

    That’s the problem at the moment. The Computer Science department at my uni publishes loads of RDF data about the students, research projects etc. – there just isn’t an application that you can say “use this source, use this source too, use this one as well – now tell me some good stuff”.

    I believe this is being worked on ;)

    But to be honest, having worked over the summer for the AKT project and dealing with RDF/Semantic Web technologies first hand – I’m still quite skeptical if they’ll ever get popular in the real world.

  2. One basic thing that’s still wrong in discussions is comparing RDF to MFs solely at the syntax level. You are spot on with saying that cloning MF encodings in RDF are not very helpful (apart from a few exceptions where e.g. using eRDF can be easier to generate key-value-like views from a database), MFs will always be more compact.

    There is a longtail utility coming with (e)RDF, but the real benefits (IMO) will be storage, querying, and remixing, which is a lot more straight-forward to implement on the model/triple level. Luckily, RDFers are starting to bring together MFs and RDF infrastructure (GRDDL+SPARQL, paggr, …), so we will hopefully get over the “versus”-debate soon and move on to discuss synergy effects.
    (sorry for the shameless plugs)

  3. You mention that you think Tom Morris “gets it”, but he also comes up with a comment even more laughable and/or offensive than the “it’s all marketing” one.

    Of course, he may well be joking: however, it’s still pretty offensive, in my opinion, towards both male and female developers who support and use Microformats.

  4. [PS. I’ve pulled most of the links from this post as the anti-spam system complained – a warning ahead of time would have been appreciated – anyhow I’ve left markers]

    Many thanks for this well-argued post. I am a fan of RDF, but agree with many of your points. Some however could use a little more background. Ok, for starters:

    “The thing about RDF is that no-one has yet demonstrated any real-world reason to care about it.”

    Speaking generally, I’d say that wasn’t actually true. For example, one particular area in which RDF and Semantic Web technologies are finding significant application is around Health Care and the Life Sciences [GoogleMe]. But in the in the context of the “the real world of web browsers” you have a point. But I’d suggest the problem isn’t really a lack of real-world reasons to care about it, more that its hasn’t been effectively demonstrated.

    RDF has been in development for so many years and no-one of importance in the consumer world has cared.”

    Again, that’s not strictly true. An example there is the XMP format for embedded metadata, which is a profile of RDF (many photo tools make use of the stuff, including all of Adobe’s products). RDF and associated technologies have been in development for many years, but what they address is the general problem of data on the web, not something solvable overnight. But there are now there are solid core specifications and tool implementations for all popular programming languages.

    One more point on the general case. Most of the large enterprise-oriented companies are interested in the technologies, for example IBM [GoogleMe] HP [GoogleMe] have ongoing initiatives, Oracle’s flagship database has RDF support [GoogleMe].

    But ok, your main arguments are in the context of microformats. I’ve no doubt there is at least skepticism, maybe even bitterness in some corners of the RDF community about the growth of microformats. But I’m afraid to extent in this post you yourself have fallen into a similar trap, seeing similarities between RDF and microformats and assuming it’s one versus the other. While being able to publish data is one of the core prerequisites of the Semantic Web, the format(s) used is secondary to the data model.

    You say:
    “The biggest technical barrier for RDF has actually been solved; publishing no-longer requires XML nor hidden duplication of the data that’s already in your page. "

    It’s debatable whether this has been the biggest technical barrier, but you’re absolutely right in suggesting the problem has been solved. Microformats are a huge win for RDF.

    While I don’t disagree with your points about XML, I would suggest that until we see a solid revision of non-XML HTML, it is a little easier to publish data unambiguously using XML formats (including XHTML), thanks to the XML tool set. But all RDF needs is something that can be parsed deterministically, the specific format doesn’t matter. (The Turtle [GoogleMe] RDF syntax is popular for handcoding & debugging which is closer to JSON than XML). For microformats tools to be useful, they also need to avoid ambiguity. Right now having HTMLTidy or similar in the chain tends to be a necessary evil.

    One slightly tangential issue around here is that a lot of data that could potentially be published on the web is tied up in databases. One of the big problems there is that every database has its own schema, without some kind of common language(s) there’s no way the material could be published in an interoperable form. There are now a bunch of tools that allow RDBMs to expose the data in an RDF-friendly fashion (with the help of one-off mappings). Where microformats exist for a particular domain, there’s no reason they should be used for the published data.

    I agree with you about it being mistaken to push an agenda of ’microformats aren’t powerful enough, RDF is better’. But while microformats have solved the problems of say publishing contact and event information, they are (intentionally) limited in their coverage. More significantly, RDF is concerned with the model not the format, the problem space is different.

    Microformats neatly solve the problem of simultaneously publishing information in a human-readable and machine-processable fashion. But they don’t address the general case, how to publish arbitrary data – there, as you suggest eRDF can augment them.

    What’s more, microformats don’t attempt to address the core purpose of RDF, which is to be able to say anything about anything in a language that is machine-processable in a fashion consistent with the web. The solution RDF offers is a very simple entity-relationship model with both the entities and relationships being identified using URIs. When data is expressed in this model, there’s a common language through which data from different sources and expressed in different formats can be integrated.

    When it comes to domain-specific data, anyone can make up their own vocabularies/schemas/ontologies and use them on the web without worrying about conflicts with existing vocabularies because the terms are identified with URIs. But to maximise the value of their data, it makes sense to reuse existing vocabularies. This is entirely consistent with the ‘paving of the cowpaths’ around microformats. If I want to talk about an event, it makes sense to use the concept as defined by iCalendar. If I want to express this in HTML I can use hCalendar. If I want to merge this information with, say, information about government bodies I can use the RDF model of this information, and query across the combination (using SPARQL). I’d probably want to see the results in HTML, so again it would make sense to use hCalendar to express the event information contained in the results.

    One final attempt at highlighting the non-conflict between RDF and microformats:
    Would you say microformats have succeeded where Java has failed? After all, you can express business card data in each, class="vcard" or public class VCard {}. Java has the facility for serialisation, and its objects could be presented on the web. Ok, I doubt the serialisations are particularly human-readable, even as XML. So if you wanted to put the data contained in such objects on the web, your best bet would probably be to use microformats. RDF is designed for information modelling and data manipulation (i.e. machine-oriented knowledge representation) rather than general programming, but if you wanted to present business card material on the web, similarly you’d probably be best off using hCard.

    Going back to your earlier statement, that no-one has yet demonstrated any real-world reason to care about RDF, this is changing. The W3C now has a Semantic Web Education and Outreach group [GoogleMe], and part of its remit is work on exposing real-world demonstrations. But if you look around a bit, you will see that there already quite a few things around.

    Off the top of my head I’d suggest dbpedia.org, which allows Wikipedia data to be queried online much in the same fashion as a relational DB. There’s Tim Berners-Lee’s Tabulator [GoogleMe], a “generic data browser”, which on the surface is an Ajax-based mashup tool, the difference being that it’s capable of mashing up fairly arbitrary data. There’s Revyu [GoogleMe], an RDF-based Web 2.0-style review site. The forthcoming IPTV service Joost [GoogleMe] (aka The Venice Project) is creating quite a buzz in Web 2.0 circles, and that makes extensive use of RDF. (A good place to watch this stuff is Planet RDF).

  5. Oops, at least one typo:
    “Where microformats exist for a particular domain, there’s no reason they should be used for the published data.”
    =>
    “Where microformats exist for a particular domain, there’s no reason they shouldn’t be used for the published data.”

  6. Ben

    @ FatBusinessman: I can assure you Tom is definitely joking there. This ‘gets it’ complement is based off the back of having met him at BarCamp.

    Also, the whole ‘girlsofmicroformats’ thing is a tongue in cheek in itself, it doesn’t strike me as particularly offensive to comment on it likewise. That said though, lots of comments like that at the same time as ‘ooh they’ve got a pretty logo’ and ‘Jeremy Keith is using his mind-trick to spread them’ and so on all at once is all contributing to the same negative impression of a weak dismissal of microformats.

    @ Danny: I’m really sorry about the links getting blocked. I’ve immediately disabled that plug-in and will just deal with the extra spam by hand until I come up with a more robust implementation.

    Thank you very much for the thorough response. I shall digest it over the next few hours!

  7. Note for the humour-impaired: I was indeed joking with that post. If it’s offensive, then I should be offended too since I support, use and develop tools to work with microformats.

    The whole “versus” thing is a joke that a few of us at BarCamp thought up. If you had watched the video, you would have seen that there is no “versus” at all – I think microformats are great, and I hope that the GetSemantic project will push microformats heavily for most applications, but also provide “microformats for the long tail” (tailformats?).

    May I point out that “girlsofmicroformats” was an invention of the microformats community? I’m sure that if any of the ladies featured in the photos had any problems with it, they would have made a big fuss about it themselves. I was linking to it because it appeared on the #microformats channel and it seemed like a funny, cool and interesting thing to post. The fact that there are people making and selling t-shirts in order to promote a semantic data process is cool.

    When we say “pretty logo” and so on, we really mean “good documentation, tools, implementations and user community”.

    I also agree with you, Ben, regarding XHTML and XML. I’m an XML geek, so I’d love to see more people using XHTML, but if there’s the possibility that advocacy of XHTML is going to prevent people from putting semantic data out there, then I’ll stop doing it. XHTML and XML are personal preferences because they make my life easier, not technical requirements for adoption of eRDF etc.

  8. Ben, I think you’ve hit the nail on the head. What I’m waiting to hear is a succinct, clear explanation of what RDF can do for me in the real world that Microformats (current or potential) can’t.

    Given the work I did at university on ontologies and agents, I think there is an answer, but noone is talking it. What RDF needs is a Simon Willison to give OpenID style clarity and enthusiasm to the debate.

  9. It’s an interesting post. I seem to be by far the most loud mouthed person who has been arguing for Semantic Web technology, so for once I’m going to try and tread carefully.

    The problem with RDF in the eyes of the web development community is that it isn’t useful. I would like to challenge that. RDF is very useful, but think of it a bit like SGML. Some really smart people love raw SGML and all the power it gives them, however most of us mere mortal prefer the rather constrained subset of HTML.

    There are a lot of implementations that use the triple formats to deliver information in the real world, Danny Ayers mentioned some, Adobe are definitely a good example. Triples are being used effectively, but not so much on the ‘web’. It’s the ‘web’ bit that seems to throw people.

    Timbl’s original vision was for two webs side by side, one for machines and one for people. The human web we have, the other one we don’t really have. The problem creating the second one is finding common formats machines can understand and actually producing the data. Since so much of (human) web is produced manually the machine web has become a problem.

    In stepped Microformats. Microformats was a truly brilliant solution, it fit the problem perfectly. It was a great way to combine the web and the need to provide a data source.

    My argument is this, Microformats does the simple cases really well, however the extra “20%” as some people put it is achievable with a very small step using eRDF. My worry is not that Microformats isn’t good, it’s that something better is so close.

    Web 2.0 wouldn’t have happened if so many people hadn’t opened up their APIs not knowing what would happen. Google maps, Flickr, Delicious all these products got more interesting because their data was as available as possible. RDF has all the egghead scientist doing crazy stuff, but they can only do crazy stuff with data they have. One small step and Microformats would be part of that.

  10. >It fascinates academics who would love — just for the sake of it — to model the entire universe in triples but in the real world of web browsers the value has never really been promoted.

    >There aren’t any realworld RDF applications that I am aware of…

    Danny gave a very substantial and good answers as usual. Just to add a bit more to the useful infomation he has been given. All Adobe suite products puts RDF in files for more than 4 years, I guess that must be a few millions files by now. Some companies implement the grabbing of this RDF data. For example, Flickr sucks the RDF data contained in images and put them into Flickr.

You can file issues or provide corrections: View Source on Github. Contributor credits.