Portrait of Edd Dumbill, taken by Giles Turnbull

Subscribe to updates

Feed icon Atom or RSS

or get email updates

What I make

expectnation
a conference management web application


XTech Conference
a European web technology conference

Choosing XML or RDF

When I created DOAP, I chose to use RDF/XML over plain XML. Why? I've tried to explain this in the talks I've given about DOAP, but not always done a good job of it due to time constraints.

Here are some of the reasons.

I chose RDF for DOAP because I wanted to create a decentralized system based on semantic web technologies.

I wanted to be able to interoperate and mix terms with existing web vocabularies, and give a defined way for others to mix my vocabulary with theirs. Interoperating vocabularies include Dublin Core, FOAF, RSS1, for instance.

This gives people a reasonable way of extending DOAP for their own purposes without having to wait for me to alter the schema to accommodate their new terms, etc. All they do is use their own namespace, and they're off.

DOAP is at heart a social project, as much about the wikis and mailing lists and participants as about the software itself. This is why decentralization was important to me: I wanted people to be able to express their relation to a project without having to control the One True File that describes it.

There's a certain cost to pay in terms of the baggage the RDF model brings. Though you can make the RDF/XML look reasonably XMLish, there are intrusions -- the type system that means you use rdf:resource to denote URIs for instance. Also, you bring in the problem of choosing whether to name things via URIs or indirectly with unique property-value pairs. Also anyone adding to or mixing DOAP files must follow a certain set of rules and understand the basics of RDF models to do that.

But the benefit of RDF having a models and rules in the first place is that data expressed in RDF is unambiguous about the model you want: there's a well-defined mapping from the syntax to the resulting relational graph. XML doesn't have that, and a schema doesn't define that. In effect you must communicate the desired model in prose, assumptions of shared world views, or other means. And you waste time in designing your XML vocabulary.

For example, consider:

 <person name="Edd Dumbill">
   <address>
      <street>64 Easy Street</street>
      <city>York</city>
   </address>
 </person>

Because we have shared world view, you can infer that this is probably a description of me and where I live. But it might be that I'm a travelling salesman and this is my day's assignment. The relationship of person to address is undefined.

Here's an RDF idiomatic view:

 <Person>
    <name>Edd Dumbill</name>
    <livesAt>
       <Address>
         <street>64 Easy Street</street>
         <city>York</city>
       </Address>
    </livesAt>
 </Person>

OK, it's more verbose. On the other hand, because you must describe a graph, you can't get away without making the relationship between the person and the address explicit. The livesAt property can then, if we wish, be formally linked up into a web of meaning via an RDF/OWL schema.

Given my leanings to decentralization, and as I anticipated most people would use templating systems to generate the DOAP, and that I try to minimise RDF's intrusions, I considered the cost of using RDF worth bearing.

For RDF

  • Unambiguous expression of model, no need to invent a syntax
  • Decentralisation of extension
  • Good tool support in Python, Java and other major programming langs.
  • Vocabulary evolution is easy due to tolerant nature of RDF processors.

For XML

  • You must use RDF tools. XML tools aren't sufficient.
  • Awareness of the RDF model not as widespread as understanding of XML
  • Moderate increase in the clunkiness of your output syntax
  • It's hard to lock an RDF vocabulary down, should you want to.

Red Herrings

  • The notion that RDF/XML is any more difficult to write than straight XML soon becomes irrelevant when you construct anything over and above a trivial document type.
  • Another false notion is that RDF/XML documents become a namespace mess, with dc: this and rss: that mixed in with your own namespaces. DOAP defines all its terms in its own namespace, using an RDF schema to indicate equivalences with existing vocabularies.
  • Although it's still, in my opinion, best for interchange, we're not stuck with RDF/XML as a syntax. I've found Turtle to be great for "hacking" work on RDF.

For those who've not come across Turtle before, here's the earlier example:

 [a :Person;
     :name "Edd Dumbill";
     :livesAt [ a :Address;
                  :street "64 Easy Street";
                  :city "York" ]
 ]

Summary

These are the basic reasons I favoured RDF for making DOAP. Other people's use-cases will differ. I also ought to add in the interests of honesty that I do have an affection for RDF as a technology, and that makes me want to make it work!

blog comments powered by Disqus


You are reading the weblog of Edd Dumbill, writer, programmer, entrepreneur and free software advocate.
Copyright © 2000-2012 Edd Dumbill