Portrait of Edd Dumbill, taken by Giles Turnbull

Subscribe to updates

Feed icon Atom or RSS

or get email updates

What I make

expectnation
a conference management web application


XTech Conference
a European web technology conference

Practical Perl

The first four or so years of my professional career were spent writing mostly Perl. I worked at a news agency and Perl was the ideal tool to wrangle incoming and outgoing data into shape. When I went independent I dropped Perl like a stone: it didn't feel like "proper programming". I suspect too that I never quite got over the move from Perl 4 to Perl 5, magic @ISAs and their ilk had started to scare me.

Being a wanton idealist I've been programming in either Python or C since then. Yet every year around this time I'm reminded just how excellent Perl is. The problem at hand is the creation and organisation of the schedule and proceedings for XML Europe.

I receive an Excel file, encoded in glorious windows-cp-1252, with all the proposals data. From there I need to generate a schedule and an index into the proceedings. Speakers actually do submit their papers as XML, which is some compensation.

Over the years I've built up some Perl that makes solving this integration problem pretty easy. Perl's biggest asset is CPAN. The modules that help me most with XML Europe include:

  • DBD::Excel. This is the Perl database driver for Excel. I can treat the Excel export from the submissions system just like a database, and perform SQL queries over it.
  • Text::Iconv. This is the Perl interface to the iconv character set conversion library. I use it to wrangle text out of a nasty mean Windows encoding into glorious UTF-8.
  • XML::LibXML. This is the Perl interface to Daniel Veillard's libxml, and helps me perform the most excellent part of the process. I use the DOM support to augment handwritten XML files with data drawn from the spreadsheet.

This year I went a step further with the XML support, and actually marked up the initial schedule in XML. Previously I had used a spreadsheet to make a rough grid of the conference, necessitating entering cross references by hand. Thanks to a short Perl script I am now able to write a brief XML file describing the sequence of talks, using their IDs, and have that XML automatically augmented with data from the Excel spreadsheet. A relatively simple XSLT stylesheet is then applied, and I have the conference schedule to hand.

The special thing about this process is an oft-overlooked advantage of XML, programmatic access to documents. I can scribble away in my text editor in perfect harmony with a program that also edits the same file. Later on in the production process I'll be using a script that goes through the conference papers and checks the image sizes are correct, altering the original paper if needs be.

You really can't beat Perl when it comes to interacting with diverse sources of data.

blog comments powered by Disqus


You are reading the weblog of Edd Dumbill, writer, programmer, entrepreneur and free software advocate.
Copyright © 2000-2012 Edd Dumbill