Portrait of Edd Dumbill, taken by Giles Turnbull

Subscribe to updates

Feed icon Atom or RSS

or get email updates

What I make

expectnation
a conference management web application


XTech Conference
a European web technology conference

A general purpose XML indexer for dotLucene

I've been impressed with the Lucene search engine, but it occurred to me that if you were indexing XML, it is a bit onerous to write classes to index each document type.

So, using dotLucene and Mono I made a general-purpose XML indexer, which I have imaginatively titled XmlIndexer.

All XmlIndexer requires is a rules file, which describes how to map XPath expressions for the incoming documents into Lucene fields. As well as field mappings, the rules file defines a key, which defines the unique identifer for a fragment to be indexed.

XmlIndexer can consider any XML element to be a document for the purposes of Lucene, so you can index multiple entities contained in one file, as long as you can concoct a unique key for each of them.

I've bundled a command-line query and indexing tool, as well as a really simple ASP.NET interface. Search results are presented as XML, enabling easy transformation via XSLT. It should be easy enough to use the XmlIndexer classes to create a more sophisticated indexing daemon should you wish.

XmlIndexer significantly lowers the bar to using dotLucene. Rather than write lots of code, all you need to do is to be able to provide an XML representation of the things you care about, and a rules file to say how to index it.

XmlIndexer was written for my own purposes, but I'm sharing it as free software under the Apache license. Right now it ships as a MonoDevelop project, but I intend to provide a more conventional build tarball in the future.

blog comments powered by Disqus


You are reading the weblog of Edd Dumbill, writer, programmer, entrepreneur and free software advocate.
Copyright © 2000-2012 Edd Dumbill