Choosing XML or RDF

When I created DOAP, I chose to use RDF/XML over plain XML. Why? I've tried to explain this in the talks I've given about DOAP, but not always done a good job of it due to time constraints.

Here are some of the reasons.

I chose RDF for DOAP because I wanted to create a decentralized system based on semantic web technologies.

I wanted to be able to interoperate and mix terms with existing web vocabularies, and give a defined way for others to mix my vocabulary with theirs. Interoperating vocabularies include Dublin Core, FOAF, RSS1, for instance.

This gives people a reasonable way of extending DOAP for their own purposes without having to wait for me to alter the schema to accommodate their new terms, etc. All they do is use their own namespace, and they're off.

DOAP is at heart a social project, as much about the wikis and mailing lists and participants as about the software itself. This is why decentralization was important to me: I wanted people to be able to express their relation to a project without having to control the One True File that describes it.

There's a certain cost to pay in terms of the baggage the RDF model brings. Though you can make the RDF/XML look reasonably XMLish, there are intrusions -- the type system that means you use rdf:resource to denote URIs for instance. Also, you bring in the problem of choosing whether to name things via URIs or indirectly with unique property-value pairs. Also anyone adding to or mixing DOAP files must follow a certain set of rules and understand the basics of RDF models to do that.

But the benefit of RDF having a models and rules in the first place is that data expressed in RDF is unambiguous about the model you want: there's a well-defined mapping from the syntax to the resulting relational graph. XML doesn't have that, and a schema doesn't define that. In effect you must communicate the desired model in prose, assumptions of shared world views, or other means. And you waste time in designing your XML vocabulary.

For example, consider:

 <person name="Edd Dumbill">
   <address>
      <street>64 Easy Street</street>
      <city>York</city>
   </address>
 </person>

Because we have shared world view, you can infer that this is probably a description of me and where I live. But it might be that I'm a travelling salesman and this is my day's assignment. The relationship of person to address is undefined.

Here's an RDF idiomatic view:

 <Person>
    <name>Edd Dumbill</name>
    <livesAt>
       <Address>
         <street>64 Easy Street</street>
         <city>York</city>
       </Address>
    </livesAt>
 </Person>

OK, it's more verbose. On the other hand, because you must describe a graph, you can't get away without making the relationship between the person and the address explicit. The livesAt property can then, if we wish, be formally linked up into a web of meaning via an RDF/OWL schema.

Given my leanings to decentralization, and as I anticipated most people would use templating systems to generate the DOAP, and that I try to minimise RDF's intrusions, I considered the cost of using RDF worth bearing.

For RDF

Unambiguous expression of model, no need to invent a syntax
Decentralisation of extension
Good tool support in Python, Java and other major programming langs.
Vocabulary evolution is easy due to tolerant nature of RDF processors.

For XML

You must use RDF tools. XML tools aren't sufficient.
Awareness of the RDF model not as widespread as understanding of XML
Moderate increase in the clunkiness of your output syntax
It's hard to lock an RDF vocabulary down, should you want to.

Red Herrings

The notion that RDF/XML is any more difficult to write than straight XML soon becomes irrelevant when you construct anything over and above a trivial document type.
Another false notion is that RDF/XML documents become a namespace mess, with dc: this and rss: that mixed in with your own namespaces. DOAP defines all its terms in its own namespace, using an RDF schema to indicate equivalences with existing vocabularies.
Although it's still, in my opinion, best for interchange, we're not stuck with RDF/XML as a syntax. I've found Turtle to be great for "hacking" work on RDF.

For those who've not come across Turtle before, here's the earlier example:

 [a :Person;
     :name "Edd Dumbill";
     :livesAt [ a :Address;
                  :street "64 Easy Street";
                  :city "York" ]
 ]

Summary

These are the basic reasons I favoured RDF for making DOAP. Other people's use-cases will differ. I also ought to add in the interests of honesty that I do have an affection for RDF as a technology, and that makes me want to make it work!

Link | Fri Feb 18 09:52:17 +0100 2005

blog comments powered by Disqus

You are reading the weblog of Edd Dumbill, writer, programmer, entrepreneur and free software advocate.

Behind the Times

Edd Dumbill's weblog

What I make