Working with XPath: The .NET Way

In this article we will focus on fundamentals of XPath and how to work with XPath, together with the .NET framework.

What is XPath?

Lot of readers have already asked me about working with XPath and XQuery using the .NET framework. To answer you all, I will start with a simple XPath article. In this article, I will introduce you to the concept of “XPath” together with .NET, which is one of the several XML technologies existing today.  Prior to reading this, you need to have a sound understanding of XML. 

Coming to “XPath,” it is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document in a very easy and effective manner.  It is not something like “find and replace” in notepad or other word processing applications.  “XPath” has its own rules, structure, syntax and several other strict issues to work with.  But if we really focus and understand “XPath” from scratch, all of those strict issues become very easy.

First of all, is “XPath” necessary or compulsory?  To answer a question with another question, is SQL compulsory? Can’t we develop applications without SQL at all? I think you can already guess my answer.  To be frank, “XPath” is not compulsory.  You can still achieve everything (working with an XML document) without working with “XPath” at all.  But, one should consider some of the common issues in application development, such as ease, effectiveness, speed, productivity, simplicity, and so on.  All of those are available with “XPath” when working with XML documents

Another important issue to consider is “XPath” is a “NON-XML language.”  This is one of the most critical confusions among many application developers.  It is just a language for querying XML documents, not XML itself.  Because XPath is an abstract language, it can be used in many environments.  It’s heavily used throughout XSL Transformations (XSLT) to identify nodes in the input document (XML document).  It’s also used in most Document Object Model (DOM) implementations for richer querying capabilities.

{mospagebreak title=What is inside XPath?}

Everybody knows that XML is nothing but a tree of several related (and structured) nodes of textual information.  XPath is a language for picking nodes and sets of nodes out of this tree. From the perspective of XPath, there are seven kinds of nodes:

  • The root node
  • Element nodes
  • Text nodes
  • Attribute nodes
  • Comment nodes
  • Processing instruction nodes
  • Namespace nodes

Those are not new buzzwords to any developer who knows XML.  Everybody knows that “root” refers to the topmost element within the XML document.  All other nodes are comprised of “elements.”  Every element contains information either in the form of text or attribute. Commenting is also allowed in an XML document.  These are a bit synonymous to XPath as well, but a bit different in certain aspects.

The XPath data model has several features that are not obvious. First, the tree’s root node is not the same as its root element. The tree’s root node contains the entire document, including the root element and comments and processing instructions that occur before the root element start tag or after the root element end tag.  The XPath data model does not include everything in the document. In particular, the XML declaration and DTD are not addressable via XPath. However, if the DTD provides default values for any attributes, then XPath recognizes those attributes. 

Finally, “xmlns” attributes are reported as namespace nodes. They are not considered attribute nodes, though a non-namespace aware parser will see them as such. Furthermore these nodes are attached to every element and attribute node for which that declaration has scope. They are not just attached to the single element where the namespace is declared.

XPath uses path expressions to select nodes or node-sets in an XML document. The simplest expression (or location path) is the one that selects the document’s root node. This path is simply the forward slash /. (You’ll notice that a lot of XPath syntax was deliberately chosen to be similar to the syntax used by the Unix shell. Here / is the root of a Unix filesystem and / is the root node of an XML document.) These path expressions look very much like the expressions you see when you work with a traditional computer file system.

XPath also includes over 100 built-in functions. There are functions for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, and more.

{mospagebreak title=XPath with a simple example}

The XPath type system is very simple, as you can observe from the following:

  • Node-set (A collection of nodes without duplicates)
  • Boolean (true or false)
  • Number (integers or floating point numbers)
  • String (sequence of characters)

Let us consider a small XML document (invoice.xml) containing the following information:

<invoice id=’123’>
       <item>
           <sku>100</sku>
           <price>9.95</price>
       </item>
       <item>
           <sku>101</sku>
           <price>29.95</price>
       </item>
</invoice>

The hierarchy would start with “root” (just consider “/”) and then only continues with “invoice” (and further with item, sku, price and so on). The following XPath expression identifies the two “price” elements:

/invoice/item/price

The above type of expression is called a “location path”.  Location path expressions look like file system paths, only they navigate through the XPath tree model to identify a set of nodes (known as a node-set).  A location path expression yields a node-set.  Location paths can be absolute or relative.  Absolute location paths begin with a forward slash (/) whereas relative location paths do not.

XPath can be used with a variety of XML processors including MSXML DOM, .NET, JAXP and so on.  The following is a simple JavaScript (based on MSXML DOM) to search for elements in an XML document using XPath:

var nl = doc.selectNodes(“/invoice/item/price”);
for (i=0;i<nl.length;i++)
{
//do some processing here
}

{mospagebreak title=XPath related classes in .NET}

.NET framework provides full support to XML with the “System.XML” namespace.  If we need to work with XPath, the following classes would be a bit helpful:

  • XPathNavigator
  • XPathNodeIterator
  • XPathExpression
  • XPathDocument
  • XPathException

The “XPathNavigator” class allows you to define a read-only, random access cursor on a data store. The “XPathNodeIterator” class enables you to iterate a set of nodes that you select by calling an XPath method. The “XPathExpression” class encapsulates a compiled XPath expression. An XPathExpression object is returned when you call the Compile method. The Select, Evaluate, and Matches methods use this class. The “XPathDocument” class provides a read-only cache for fast and highly optimized processing of XML documents using XSLT. “XPathException” is the exception that is thrown when an error occurs during the processing of an XPath expression.

Apart from the above, there exists one more interface, “IXPathNavigable”.  This interface enables you to create an XPathNavigator class. The classes that implement this interface enable you to create navigators using the CreateNavigator method.

To create an XPathNavigator object for an XML document, you use the CreateNavigator method of the XmlNode and XPathDocument classes, which implements the IXPathNavigable interface. The CreateNavigator method returns an XPathNavigator object. You can then use the XPathNavigator object to perform XPath queries. You can use XPathNavigator to select a set of nodes from any data store that implements the IXPathNavigable interface. A data store is the source of data, which may be a file, a database, an XmlDocument object, or a DataSet object. You can also create your own implementation of the XPathNavigator class that can query other data stores.

The XPathNavigator object reads data from an XML document by using a cursor that enables forward and backward navigation within the nodes. In addition, XPathNavigator provides random access to nodes. However, because the cursor that the XPathNavigator object uses is read-only, you cannot edit an XML document by using the XPathNavigator object.

{mospagebreak title=Examining XPath with a simple VB.NET/C# example}

You can use the Select method of the XPathNavigator object to select the set of nodes from any store that implements the IXPathNavigable interface. The Select method returns an object of the XPathNodeIterator class. You can then use the object of the XPathNodeIterator class to iterate through the selected nodes.

After you have an XPathNodeIterator object, you can navigate within the selected set of nodes. The following code displays how to create an XPathNavigator object on an XML document, select a set of nodes by using the Select method, and iterate through the set of nodes.

Imports System.Xml
Imports System.Xml.XPath
.
.
Dim Doc As XPathDocument = New XPathDocument(“invoice.xml”)
Dim Navigator As XPathNavigator
Navigator = Doc.CreateNavigator()
Dim Iterator As XPathNodeIterator = Navigator.Select(“/invoice/item/price”)
While Iterator.MoveNext()
    Console.WriteLine(Iterator.Current.Name)
    Console.WriteLine(Iterator.Current.Value)
End While

The C# version of the above will be very similar, as you can see from the  following:

using System.Xml;
using System.Xml.XPath;
.
.
XPathDocument Doc = new XPathDocument(“invoice.xml”);
XPathNavigator navigator = Doc.CreateNavigator();
XPathNodeIterator iterator = navigator.Select(“/invoice/item/price”);
while (iterator.MoveNext())
{
    Console.WriteLine(iterator.Current.Name);
    Console.WriteLine(iterator.Current.Value);
}

Summary

In this article, I didn’t provide an in-depth discussion of XPath.  I mostly tried to cover introductory concepts of XPath and how to work with XPath in a .NET environment.  We can have further flexibility in the navigation methods of the XPathNavigator as follows:

MoveTo
MoveToNext
MoveToPrevious
MoveToFirst
MoveToFirstChild
MoveToParent
MoveToRoot
MoveToId

I leave it to the readers to further investigate the current subject.

9 thoughts on “Working with XPath: The .NET Way

  1. Great example …

    I am trying the following and having no luck. Was wondering if you can help.

    I have /rss/channel/item returning many records
    inside of that there is a /general node which has several keywords nodes
    which I want to grab the /string value from

    I can get an iterator on the 100 item nodes but can not move further always getting stuck on getting zero count on deeper nodes.

    My test code looks like this

    XPathNavigator^ nav = doc->CreateNavigator();

    XPathNodeIterator^ iterator = nav->Select(“/rss/channel/item”); //This works with 100 results
    while (iterator->MoveNext())
    {
    XPathNavigator^ nav2 = iterator->Current->Clone();
    XPathNodeIterator^ iter = nav2->Select(“/general”);
    while (iter->MoveNext())
    {
    String^ d2 = iter->Current->Name;
    String^ d23 = iter->Current->Value;
    }
    }

    And my source xml is this
    http://www.powerhousemuseum.co

    I would also like to grab all of the jpg urls from the each item but have not progressed past the first part to the second. The path to the URLs is
    // rss/channel/item/relation(2nd)>resource>identifer>

    Any help or advice you could give would be great. I am using VS2005.

  2. Sorry I should mention that the record count returned of 100 is when you change the URLs last parameter to 100 … My sample xml is only one record for simplicity …

    Thanks

  3. Hello,

    The thing is that I have a little application in which I have replaced SQL database with an XML file; I use it to store movies information, just as a hobby. What I am trying to do is a search in the file but, at least I put the full name of the movie the result is empty. Is there an equivalent of SQL sentence ‘LIKE’?

    I have been researching about it, but I just can not find anything.

    Hope someone could read my comment, because this is a 5 year old post! =-s

[gp-comments width="770" linklove="off" ]