XML Processing with the XMLReader Object, Part 1
(Page 1 of 5 )
Need to parse an XML document, but don't want to incur the performance penalty of the DOM? Consider using the new XMLReader object, which lets you process an XML document sequentially, thereby improving speed and allowing greater customization. Today, I'll be examining this new offering for .NET/XML developers, which provides an alternative pull model of dealing with XML data. So pay attention - this is cutting-edge stuff, and it’s only going to get more interesting.
Push and Pull
If you're at all familiar with XML programming, you'll be aware that there are two basic approaches to parsing an XML document. The Simple API for XML (SAX) is one; it parses an XML document in a sequential manner, generating and throwing events for the application layer to process as it encounters different XML elements. This sequential approach enables rapid parsing of XML data, especially in the case of long or complex XML documents; the downside is that a SAX parser cannot be used to access XML document nodes in a random or non-sequential manner.
Next, we have the Document Object Model (DOM). This alternative approach involves building a tree representation of the XML document in memory, and then using built-in methods to navigate through this tree. Once a particular node has been reached, built-in properties can be used to obtain the value of the node, and use it within the script. This tree-based paradigm does away with the problems inherent in SAX's sequential approach, allowing for immediate random access to any node or collection of nodes in the tree.
Now, I've already shown you how to use the DOM approach to parsing XML with .NET's XMLDocument object. However, while the DOM does offer seamless access to your XML data, it comes at the cost of performance. This is especially noticeable if your application has to deal with large XML files. This trade-off between performance and ease of use is one of the more knotty problems developers had to face when designing an XML application.
Notice I said "had." Microsoft has a possible solution, one that incorporates the best of both worlds. They call it the "pull model" and, according to their documentation, it's designed to provide "forward-only, read-only, noncached access to XML data". This means that you can now read an XML document in a sequential but selective manner and thereby control the process of parsing. This is an interesting variant of the SAX model, which is non-selective in nature - there the parser will notify the client about each and every item that it encounters in the XML stream. This is analogous to a customer, in a restaurant, ordering his or her choice after reading a menu as opposed to the waiter stuffing all the items down his throat.
Next: Class Act >>
More XML Articles
More By Harish Kamath (c) Melonfire