XML Processing with the XMLReader Object, Part 1

Need to parse an XML document, but don't want to incur the performance penalty of the DOM? Consider using the new XMLReader object, which lets you process an XML document sequentially, thereby improving speed and allowing greater customization. Today, I'll be examining this new offering for .NET/XML developers, which provides an alternative pull model of dealing with XML data. So pay attention - this is cutting-edge stuff, and it’s only going to get more interesting.

Contributed by
Rating: 4 stars4 stars4 stars4 stars4 stars / 15
February 02, 2004
Rate this Article:
MEH MEH++


SEARCH ASP FREE
TOOLS YOU CAN USE

advertisement

Push and Pull

If you're at all familiar with XML programming, you'll be aware that there are two basic approaches to parsing an XML document. The Simple API for XML (SAX) is one; it parses an XML document in a sequential manner, generating and throwing events for the application layer to process as it encounters different XML elements. This sequential approach enables rapid parsing of XML data, especially in the case of long or complex XML documents; the downside is that a SAX parser cannot be used to access XML document nodes in a random or non-sequential manner.

Next, we have the Document Object Model (DOM). This alternative approach involves building a tree representation of the XML document in memory, and then using built-in methods to navigate through this tree. Once a particular node has been reached, built-in properties can be used to obtain the value of the node, and use it within the script. This tree-based paradigm does away with the problems inherent in SAX's sequential approach, allowing for immediate random access to any node or collection of nodes in the tree.

Now, I've already shown you how to use the DOM approach to parsing XML with .NET's XMLDocument object. However, while the DOM does offer seamless access to your XML data, it comes at the cost of performance. This is especially noticeable if your application has to deal with large XML files. This trade-off between performance and ease of use is one of the more knotty problems developers had to face when designing an XML application.

Notice I said "had." Microsoft has a possible solution, one that incorporates the best of both worlds. They call it the "pull model" and, according to their documentation, it's designed to provide "forward-only, read-only, noncached access to XML data". This means that you can now read an XML document in a sequential but selective manner and thereby control the process of parsing. This is an interesting variant of the SAX model, which is non-selective in nature - there the parser will notify the client about each and every item that it encounters in the XML stream. This is analogous to a customer, in a restaurant, ordering his or her choice after reading a menu as opposed to the waiter stuffing all the items down his throat.

Class Act

The XMLReader abstract class plays a very important role in implementing the new "pull model." As part of the System.XML tree, the primary objective of this class is to provide developers with a framework to implement this new model. If you're an adventurous developer, you can use this abstract class as the basis for your very own, custom-crafted XMLReader object. Or you could do what I did: take the easy way out and utilize any one of the built-in classes that already do this for you.

The .NET framework provides three such built-in classes:

1. The plain-vanilla XmlTextReader class behaves as a "forward-only, noncached reader" to read XML data. It's versatile enough to allow you to access XML from different input sources, including flat files, data streams, or URLs.

2. The XmlTextReader has one little drawback: it doesn't allow you to validate the data present in the XML source. If you are looking for a foolproof way to maintain the sanctity of your data, you are better off using the XmlValidatingReader class. This is the only class in this category that comes with built-in features to validate your XML data against external DTDs, XDR or XSD schemas.

3. In case you're looking to implement the "pull model" on a DOM tree that's already present in memory, you can consider using the XmlNodeReader class. Best-suited only for the very specialized application mentioned above, this class allows you read the data from specific nodes of the tree and enjoy a double benefit: the speed associated with the XMLReader class and the ease of use of the DOM.

Now that you know the theory, how about seeing it work in the real world?

Visiting the Library

I'll begin with a simple example - using an XmlTextReader to parse a static XML file. Here's the XML file, a list of books present in our technical library:


<?xml version='1.0'


<library>
 <book id="MFRE001">
  <title>XML and PHP</title>
  <author>Vikram Vaswani</author>
  <description>
  Learn to manage your XML data with PHP
  </description>
  <price currency="USD">24.95</price>
 </book>
 <book id="MFRE002">
  <title>
  MySQL - The Complete Reference
  </title>
  <author>Vikram Vaswani</author>
  <description>
  Learn everything about this open source 
  database</description>
  <price currency="USD">45.95</price>
 </book>
</library>

And now for the ASP.NET code that will allow us to parse this XML file using the XmlTextReader object:


<%@ Page Language="C#"%>
<%@ import  namespace="System.Xml"%>
<html>
<head>
<script runat="server">
void Page_Load
() 
{
 
// location of XML file
 string strXmlFile = 
 "http://localhost/xmlpull/library.xml";
 
 // create an instance of the 

 // XmlTextReader object
 XmlTextReader objXmlRdr = new XmlTextReader(strXmlFile);
 
 // ignore whitespace in the XML file
 objXmlRdr.WhitespaceHandling=WhitespaceHandling.None;
 
 
String strSpaces;
 
 while(
objXmlRdr.Read()) {
 
  
// only process the elements, 
  // ignore everything else
  if(objXmlRdr.NodeType==XmlNodeType.Element) {
        

  // reset the variable for a new node
  strSpaces = "";
 
  
for(int count 1
  count 
<= objXmlRdr.Depth
  count
++) {
   strSpaces 
+= "===";
  
}
  output
.Text += strSpaces 
  
"=> " objXmlRdr.Name 
  
"<br/>";
 
}
}
 
// close the object and free up memory
objXmlRdr.Close();     
}
</script>
</head>
<body>
<asp:label id="output" runat="server" />
</body>
</html>

Before I get into the nitty-gritty of the code, here's what you should see when you run this script:

XMLReader

1. The first step is to import all the classes required to execute the application - the .NET libraries for the XML parser, which are part of the System.XML namespace.


<%@ import  namespace="System.Xml"%>

2. Next, within the Page_Load() function, I have defined some variables and objects. The first is a string variable to store the location of the XML file, and the second is a local instance of the XmlTextReader object.

3. Finally, in order to tell the parser to ignore the whitespace present in the XML file, I set the "WhitespaceHandling" property of the XmlTextReader object to "None", as shown below:


<%
// location of XML file
string strXmlFile = "http://localhost/xmlpull/library.xml";
 
// create an instance of the 
// XmlTextReader object
XmlTextReader objXmlRdr = 
new XmlTextReader(strXmlFile);
 
// ignore whitespace in the XML file
objXmlRdr.WhitespaceHandling = 
WhitespaceHandling.None;
%>

4. The next step is to read the XML file - a simple matter, since the object provides a Read() method for just this purpose. This method returns true if it encounters a node in the XML file. Once it is finished with the file, it returns false. This makes it easy to process an entire file, simply by wrapping the method call in a "while" loop.


<%
while
(objXmlRdr.Read()) {
 
// process the XML data
 }    
%>

5. Of course, it doesn't make sense to read the entire file and not do anything with it. That's why, within the "while" loop, I've added the code to process element nodes and format them for display.


<%
while(
objXmlRdr.Read()) {
 
 
// only process the elements
 if(objXmlRdr.NodeType==XmlNodeType.Element) {
 
  // reset the variable for a new node
  strSpaces = "";
 
  
for(int count 1
  count 
<= objXmlRdr.Depth
  count
++) {
   strSpaces 
+= "===";
  
}
  output
.Text += strSpaces 
  
"=> " objXmlRdr.Name 
  
"<br/>";
 
}
}    
%>

The "NodeType" property of the current node can be used to filter out the elements for further processing. Note that if I hadn't included this condition at the beginning of the loop, the output would also contain processing instructions like this:


<?xml version='1.0'



Don't take my word for it - change the code and see for yourself!

The rest of the code in the "while" loop ensures that the output is formatted properly for display in the browser. Pay special attention to my use of the very cool "Depth" property, which holds an integer value specifying the depth of the current node in the tree hierarchy. Simply put, the element <library> is at depth 0, the element <book> is at depth 1, and so on.

Digging Deeper

So that takes care of handling elements, but what about the attributes contained within each element? Take a look at this second example, which demonstrates how to process attributes using the XmlTextReader class:


<%@ Page Language="C#"%>
<%@ import  namespace="System.Xml"%>
<html>
<head>
<script runat="server">
void Page_Load
()  {
 
 
string strXmlFile 
 
"http://localhost/xmlpull/library.xml";
 
 
// create an instance of the 
 // XmlTextReader object
 XmlTextReader objXmlRdr = 
 new XmlTextReader(strXmlFile);
 
 // ignore whitespace in the 
 // XML file
 objXmlRdr.WhitespaceHandling = 
 WhitespaceHandling.None;
 
 
String strSpaces;
 
 while(
objXmlRdr.Read()) {
   

  
// only process the elements 
  if(objXmlRdr.NodeType == 
     XmlNodeType.Element) {
 
  // reset the variable for 
  // a new node
  strSpaces = "";
 
  
for(int count 1
  count 
<= objXmlRdr.Depth
  count
++) {
  strSpaces 
+= "===";
  
}
   

 output
.Text += strSpaces "=> " 
 
objXmlRdr.Name;
 
 
// check if the element has any 
 // attributes
 if(objXmlRdr.HasAttributes)
 {
  output.Text += " [";
  for(int innercount = 0; 
  innercount < objXmlRdr.AttributeCount;
  innercount++) {
 
   // read the current attribute
   objXmlRdr.MoveToAttribute(innercount);
   output.Text += objXmlRdr.Name;
  }
 
  
output.Text += "]";
 
  
// instruct the parser to go back 
  // the element 
  objXmlRdr.MoveToElement();
 }
output.Text += "<br/>";
 }    
}
// close the object and free up memory
objXmlRdr.Close();
}
</script>
</head>
<body>
<asp:label id="output" runat="server" />
</body>
</html>

Here's the output:

XMLReader

As you can see, there is only one major change to the original code listing - handling attributes for each element that the reader encounters in the XML file:


<%
// check if the element has 
// any attributes
if(objXmlRdr.HasAttributes)  {
 
 
output.Text += " [";
 
for(int innercount 0
 innercount 
objXmlRdr.AttributeCount;
 innercount
++) {
    

  
// read the current attribute
  objXmlRdr.MoveToAttribute(innercount);
  output.Text += objXmlRdr.Name;
 }
 
 
output.Text += "]";
 
 
// instruct the parser to 
 // go back the element 
 objXmlRdr.MoveToElement();
}
 
%>

The above code snippet makes for interesting reading. It begins with a check for attributes in the current node using the "HasAttributes" property (this property is set to true if the current node has at least one attribute). The XmlTextReader object's "AttributeCount" property stores the total number of attributes and is useful for looping through the collection of attributes. The MoveToAttribute() method positions the reader at the next attribute in the collection, and the "Name" property is then used to get the name of the attribute. Once iteration through the attributes of the current node is complete, the MoveToElement() method resets the position of the reader, and it then proceeds to the next node (if it exists).

Into the Real World

Now, if you're a developer, I'm sure the previous two examples would have raised your eyebrows a bit. The reason is simple: the examples I've shown you thus far have only studied the information structures in the XML file, completely ignoring the data contained within each attribute and element. In the real world, you're usually as concerned about the data within each element as about the element and attribute names.

That's where this next example comes in: it completes the circle, showing you how to process the data stored within each attribute and element. Take a look:

*Please note: A '/' (slash) at the end of any line indicates that the following line should be appended it, minus the '/'. - Editor


<%@ Page Language="C#" Debug="true"%>
<%@ Import namespace="System.Xml"%>
<html>
<head>
<script runat="server">
void Page_Load
() 
{
 
 
// variable to store Book ID
 string strBookId = "";
 
 // variable to store the 
 // Xml file (with location
 string strXmlRdr =  /
 http://localhost/xmlpull/library.xml;
 
 
output.Text="<B>List of Books</B>";
 
 
// create an instance of the 
 // XmlDocument object
 XmlTextReader objXmlRdr = /
 new XmlTextReader(strXmlRdr);
 
 
objXmlRdr.WhitespaceHandling = /
 WhitespaceHandling
.None;
 output
.Text += "<ul>";
 
 while(
objXmlRdr.Read()) {
 
  if(
objXmlRdr.NodeType == /
  XmlNodeType
.Element) {
 
   if(
objXmlRdr.Name == "book") {
    strBookId 
= /
    objXmlRdr
.GetAttribute("id");
   
}
 
   if(
objXmlRdr.Name=="title") {
    output
.Text += "<li>" + /
    objXmlRdr
.ReadString();
    output
.Text += "<ul>";
    output
.Text += "<li>ID - " /
    
strBookId "</li>";
   
}
 
   if(
objXmlRdr.Name=="author") {
    output
.Text += "<li>Author - " /
    
objXmlRdr.ReadString() + /
    
"</li>";
   
}
 
   if(
objXmlRdr.Name=="description") {
    output
.Text += /
    
"<li>Description - " /
    
objXmlRdr.ReadString() /
    
"</li>";
   
}
 
   if(
objXmlRdr.Name=="price") {
    output
.Text += /
    
"<li>Price - " + /
    objXmlRdr
.GetAttribute("currency") /
    
" " + /
    objXmlRdr
.ReadString() + "</li>";
    
}
   

   
else if(objXmlRdr.NodeType == 
   XmlNodeType
.EndElement) { 
 
    if(
objXmlRdr.Name == "book" ) {
     output
.Text += "</ul>";
     output
.Text += "</li>";
     strBookId 
""
     
// reset the Book Id variable
   }
  }
 }
 
 
output.Text += "</ul>";
 
 
// close the object and free 
 // up memory
 objXmlRdr.Close(); 
}
</script> 
</head> 
<body>
<asp:label id="output" runat="server"/> 
</body>
</html> 

Load this example in the browser to see the list of books on the shelves of the library:

XMLReader

I'll begin by drawing your attention to the definition of a variable right at the beginning of the script:


<%
 
// variable to store Book ID
string strBookId = "";
 
%>

This variable will be used further down in the script to store the ID of the book.

Now, the process of reading the XML file starts with the Read() method of the XmlTextReader object. This next code snippet does the dirty work of processing the data that is read by the object.


<%
 
if(
objXmlRdr.NodeType == 
   XmlNodeType
.Element) {
 
 if(
objXmlRdr.Name == "book") {
  strBookId 
=   objXmlRdr.GetAttribute("id");
 

 
 if(
objXmlRdr.Name=="title") {
  output
.Text += "<li>" 
  
objXmlRdr.ReadString();
  output
.Text += "<ul>";
  output
.Text += "<li>ID - " 
  
strBookId "</li>";   
 
}
 
 if(
objXmlRdr.Name=="author") {
  output
.Text += "<li>Author - " 
  
objXmlRdr.ReadString() 
  
"</li>";
 
}
 
 if(
objXmlRdr.Name=="description") {
  output
.Text += "<li>Description - " 
  
objXmlRdr.ReadString() + "</li>";
 
}
 
 if(
objXmlRdr.Name=="price") {
  output
.Text += "<li>Price - " 
  
objXmlRdr.GetAttribute("currency")
  
" "
  
objXmlRdr.ReadString() 
  
"</li>";
 
}
 
} else if(
objXmlRdr.NodeType == 
  XmlNodeType
.EndElement) { 
 
 if(
objXmlRdr.Name == "book" ) {
  output
.Text += "</ul>";
  output
.Text += "</li>";
  strBookId 
""
  
// reset the Book Id variable
 } 
}   
 
%>

It all starts with a check to see if the current node is an element. As seen in the first example, this test returns true when the reader encounters the starting tag of an element in the XML file. Once this is confirmed, the script checks the name of each element that so that it can be processed appropriately. Note that you can also use the IsStartElement() method of the XmlTextReader object to check whether an element is indeed the opening element.

Element processing starts with the <book> element. Since I need the book ID, I've used the shortcut GetAttribute() method of the XmlTextReader object to fetch the value stored in the "id" attribute. If you know which attribute you want, this is a convenient way to avoid having to unnecessarily iterate through the collection of attributes, as demonstrated earlier. The ID retrieved is stored in the "strBookId" variable created earlier.

During the next pass, the script will encounter the other parameters associated with a particular book - its title, description, price, currency and so on. For each of these elements, the ReadString() method can be used to retrieve the text stored in the corresponding element.

Once a particular book has been dealt with, the "strBookId" variable must be reset for the next book in the library. A good place to do this is when the reader encounters the closing </book> element. How do you know when this happens? It's simple - just check if a particular node is a closing element with the "EndElement" property and if its name is </book>, and Bob's your uncle!

As you can see, once you know the basics of reading an XML file with the XMLReader, it's very easy to begin using its built-in constructs to extract and manipulate XML data to your precise needs. As an exercise to better understand how this works, I recommend taking your own XML markup and writing a similar script to extract element and attribute values from it.After all, practice makes perfect!

And that's about it for the first part of this tutorial. Over the last few pages, I introduced you to the XMLReader class, which offers developers an alternative way of processing an XML file or stream. Unlike the DOM, the XMLReader class offers developers a framework for sequential reading, making it possible to create faster, more streamlined XML applications.

At the beginning of this tutorial, I told you that the .NET Framework came with three important classes derived from the XMLReader abstract class. Over the course of the last few pages, I introduced you to the first and most-used of these, the XMLTextReader class, and showed you how to use it to process elements, attributes and the data within them. The second part of this article will deal more thoroughly with the remaining two classes, showing you how to validate an XML document against a DTD or XML Schema before processing it, and explaining how to handle errors in an XML document. Make sure you come back for that. Until then, be good!

Note: Examples are illustrative only, and are not meant for a production environment. Melonfire provides no warranties or support for the source code described in this article.

blog comments powered by Disqus
XML ARTICLES

- More on Triggers and Styles and Control Temp...
- Looking at Triggers with Styles and Control ...
- A Closer Look at Styles and Control Templates
- Styles and Control Templates
- Properties and More in XAML
- Elements and Attributes in XAML
- XAML in a Nutshell
- Importing XML Files into Access 2007
- Using MSXML3.0 with VB 6.0
- MSXML, concluded
- MSXML, continued
- MSXML Tutorial
- Generating XML Schema Dynamically Using VB.N...
- XSL Transformations using ASP.NET
- Applying XSLT to XML Using ASP.NET

ASP Web Hosting ASP.Net Web Hosting Windows Web Hosting
ASP Free Forums 
 RSS  Tutorials RSS
 RSS  Forums RSS
 RSS  All Feeds
Site Map 
Request Media Kit
Write For Us Get Paid 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Privacy Policy 
Support 


© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 3 - Follow our Sitemap
Most Popular Topics
All ASP.Net Tutorials