Dirt Simple XML Parser

I know there are already far too many ways to parse XML. Softartisans has an entire page devoted to choosing which XML parser is best for different situations. But still I find myself facing the same trade-off each time I need to consume an XML document: do I do something simple, or do I do something efficient? DOM is easy, but very costly. SAX is the most efficient, but very difficult to use. Other technologies fall between those two on the spectrum.

I've finally solved that problem once and for all. I created an XML library that makes SAX dirt simple to use. You express an XML document declaratively, putting in hooks to handle elements that interest you. This declaration turns into a DefaultHandler for SAX to invoke. The source code is available in two parts: mallardsoft.xml depends upon mallardsoft.util.

To build a parser, create a new NestedHandler and initialize it with a new Document. Using in-line dot-chaining, set up sub elements of the Document with new Element objects. Here's a quick example:

final StringBuffer firstName = new StringBuffer();
final StringBuffer lastName = new StringBuffer();

DefaultHandler handler = new NestedHandler(new Document()
  .one("people", new Element()
    .zeroToMany("person", new Element()
      .init(firstName)
      .init(lastName)
      .optionalAttribute("firstName", firstName)
      .requiredAttribute("lastName", lastName)
      .end(new InvokeHandler() {
        public void handle() throws SAXException {
          newPerson(firstName.toString(), lastName.toString());
        }
      })
    )
  )
);

Pass this to a SAX parser and it will extract all of the person elements, grab their first and last names, and call the newPerson() method for each one. You can nest elements as deeply as you need to. And if you need a recursive declaration, use an ElementSpec object. Give it a try with a more complex XML file and see how well it scales up.

Update
I've changed the names to better reflect true XML nomenclature. The proper names are not "root" and "node", but "document" and "element".

2 Responses to “Dirt Simple XML Parser”

  1. Adventures in Software » Blog Archive » Dirt Simple XML Parser in C# Says:

    [...] March 30, I posted a Dirt Simple XML Parser in Java. The C# version of this parser is now available at DirtSimpleXML.com. I created this page [...]

  2. Adventures in Software » Blog Archive » Constrained dot-chaining Says:

    [...] my recent XML parsing library — DeclarativeXMLParser — I used a coding technique that I call constrained dot-chaining. This technique is useful [...]

Leave a Reply

You must be logged in to post a comment.