The other day I had was tinkering with that cute little poster child of Web 2.0, Flickr. Looking for a lightweight way to incorporate some photos into a web site, I headed to their feeds page to find some XML to use.
Before you read on, don't forget that the XTech 2007 call for participation ends this week.
The result was interesting. Flickr have a variety of outputs in RSS dialects, but you just can't get at the raw data using XML. The bookmarking service del.icio.us is another case in point. My friend Matt Biddulph recently had to resort to screenscraping in order to write his tag stemmer, until some kind soul pointed out there's a JSON feed.
Both of these services support XML output, but only with the semantics crammed awkwardly into RSS or Atom. Neither have plain XML for public consumption (some is available behind a registration protected API), but do support serialization via other formats. We don't really have "XML on the Web". We have RSS on the web, plus a bunch of mostly JSON and YAML for those who didn't care for pointy brackets.
Did we as supporters of XML get it wrong somehow? I think we did. The success of the XML project and the W3C as a unifying forum was seductive, and we felt more bound to the edicts of the W3C than we thought. Our justified love of open standards inadvertently promoted a straitjacketed approach to using XML. XML was felt to need a schema language.
What we missed was that settling on schemas for data serialization represented much more of an up-front commitment than most developers could really make. The rise of agile programming techniques further emphasizes the hazards of prematurely freezing a design.
Schemas never really came with great evolution or migration strategies.
And of course it didn't help that both DTDs and W3C XML Schema are pretty darn obscure. Somehow we never got the word out, despite the "standalone" XML declaration, that it's OK to spread schema-less XML around.
We had of course the Simple XML movement, but it came to little. Other voices pointed the way forward. Walter Perry notably promoted an agenda that required little a priori agreement between communicators, but it gained little ground in common thought. Ironically it took James Clark to push the agenda of developer convenience, with the RELAX NG compact syntax giving us all permission not to think in angle brackets.
Meanwhile, busy developers with work to do on the web have created non-XML syntaxes such as JSON and YAML. Though I deliberately closeted myself in XML for a long time, I have come to appreciate the utility of these syntaxes.
One of the more intriguing developments is that the Semantic Web crowd, while still very much W3C oriented, have really proved themselves more flexible in their attitude than the XML world at large. RDF/XML is hardly the only acceptable way to approach the throne of Tim, and the man himself threw out XML syntax pretty early on with the development of the N3 scribble format. More recently, the work on GRDDL to scrape semantics from HTML has shown the semweb folks' amenability to outside initiatives such as microformats.
Maybe it's not such a bad world, as making an XML mapping of YAML, JSON and friends isn't really very hard. I'd just like to get the message out to web application developers that Plain Old XML is fine by me. I live in hope yet that the rise of REST will hit this home.
So while markup has most definitely won on the web, it's a shame XML hasn't yet achieved as much as it could. It's not too late for an injection of pragmatism and a little less constraint.