In my last article I showed you how you can use Microsoft’s XML Parser to create XML documents in WSH. Today I’m going to show you how to use your scripts to read information from XML files.
Contributed by Nilpo Rating: / 2 December 01, 2008
You’re going to need a copy of the XML file that you created in my last article. If you don’t have one, you can download an example XML file here. This is a sample XSPF audio playlist. XSPF is an XML-based playlist specification designed to be easily portable and cross-platform compatible.
Its structure is relatively simple. A playlist element houses a track listing that contains each of the tracks in the playlist. Each track has its own set of elements that contain properties about that track.
Remember that XML files are simple database structures. So it’s best to think of an XML file as you would a database. That mentality will make this a little easier to understand for most system administrators, so that’s the approach I’ll be taking in this article. As you will soon see, with the help of the XML Parser, an XML file becomes far more than a simple text file. It actually allows us to create queries for returning specific information—and that sounds more like a database to me!
The first logical step to reading an XML file requires that the file actually be open. To do that, we’re going to create an instance of the XML Document Object Model by instantiating the Microsoft XML Parser. Then we’ll load an XML file.
Set objXmlDoc = CreateObject("Microsoft.XMLDOM")
objXmlDoc.async = False
objXmlDoc.load("C:\Playlist.xml")
This code is pretty straightforward. Notice that I’m setting the XML Document object’s async property to False. XML documents can be loaded either synchronously or asynchronously. In most cases you’ll want to load the file asynchronously. This helps to ensure that the entire file is loaded before script execution continues. If you choose to load the file synchronously, you’ll have to monitor its ReadyState before continuing.
If you want to do some error handling, you can test whether or not the document was successfully loaded by creating an If statement around the load method as follows:
If Not objXmlDoc.load("C:\Playlist.xml") Then
WScript.Echo "Document failed to load."
WScript.Quit
End If
This If statement will produce an error message and halt script execution if the document is not successfully loaded. If you’re having problems getting your document to load, read the “What to do when it doesn’t work” section at the end of this article.
Set objRoot = xmlDoc.documentElement
Now that the file is loaded, you can begin finding the information that you want. Like a database, you’ll most likely know what information you’re looking for; however, you can also use the line above. The documentElement method will effectively return a collection of all of the elements in your document, thus allowing you to traverse them using a For Each structure.
That being so, that’s not where the power of this control lies. As I stated earlier, you can query XML files in the same way you would a standard database. Writing queries will seem a little odd until you try it, but don’t worry; it’s actually quite simple.
Set colNodes = objXmlDoc.selectNodes("/playlist/trackList/*")
The easiest way to query an XML file is by using the XML Document object’s selectNodes method. It accepts a single string parameter as a query and returns a collection of objects that represent matching nodes. In this case, the word node and element can be used interchangeably.
The query here is quite simple. You’re going to create a path-like structure that points to the node level that you wish to return. Here, I want to return all tracks in the playlist. If you look at the XML file again, you can see that the track elements are children of the trackList element, which is a child itself of the playlist element. So a path to a track looks something like this:
<playlist><trackList><track>
The query string presents a path relative to the document root, so we simply name each element in order. You’ll also notice that I’ve used the * wildcard character. This represents all nodes at that level. In other words, I’m returning all nodes that are children of the trackList element.
The * wildcard character is used similarly to a SELECT * FROM query in WMI!
For Each objNode in colNodes
WScript.Echo objNode.Text
Next
As I said, the selectNodes method returns a collection of elements. Thus, you could easily move through the returned elements and display their contents using a For Each loop as I’ve shown above. However, in the case of our example XML file, the track element doesn’t contain any text; it actually contains more elements. So let’s try something a little different.
Set colTracks = objXmlDoc.selectNodes("/playlist/trackList/track/*")
In this example, I’m using a query to return all instances of the track element. Then for each of those instances I’m using the childNodes method to return a collection containing the property elements for that particular track. I’m simply echoing back the Text value for each of those property elements.
For a quick and easy way of dumping all of the data, you could simply do as I’ve done above. I’m just grabbing all of the property elements with the childNodes method and iterating through them all with a For Each loop. I’ve done this to demonstrate how you can use a node object’s baseName property to return the actual element tag.
Can you see how this might be useful if you want to display an unknown XML file’s document tree?
Okay, admittedly I haven’t done anything too fancy yet, have I? I’ve shown you how to return a single specific value from an XML file. But chances are good that if you’re using this as a database, you’re going to need to create much more complex queries than that.
More specifically, you’re probably wondering how you can filter results. For instance, you might want to query a specific field, or element. Possibly, you only want to return track information for tracks performed by a specific artist. This is all possible, and then some.
Set colTitles = objXmlDoc.selectNodes("/playlist/trackList/track/title")
For Each objTitle In colTitles
Wscript.Echo objTitle.Text
Next
If you only want your collection to return information from a specific element, you can avoid using a wildcard character and specify a particular field instead.
The Diary Of Jane
Who You Are
Executing this bit of code will result in something similar to the example above. I’m just generating a listing of the text value for a specific field; in this case, a list of each of the track's titles.
The query returns a collection containing every element in the document that matches the document path provided—specifically, the title element of every track. Note how every instance of the title element in the sample document matches the path in our query string.
Now suppose you want to return more than one specific element?
Set colProperties = objXmlDoc.selectNodes( _
"/playlist/trackList/track/(title | creator)")
For Each objProperty In colProperties
WScript.Echo objProperty.Text
Next
This piece of code returns any element that matches either title or creator. The syntax looks a bit rough, but think of the pipe (|) character as an OR operator. The OR clause is enclosed in parenthesis.
Set colProperties = objXmlDoc.selectNodes( _
"/playlist/trackList/track/(location | title | creator)")
For Each objProperty In colProperties
WScript.Echo objProperty.Text
Next
Returning additional fields is as easy as adding them inside the parenthesis. Just remember to keep them separated by pipe characters.
Let’s not stop there. To truly harness the power of a database, you need to be able to perform conditional queries. In WMI you would associate these with Where clauses. Where clauses serve to further filter your results based upon field values—or in this case, element text values.
This code sample will return all track instances where the creator value is equal to “Breaking Benjamin.” In other words, it only returns tracks recorded by Breaking Benjamin.
The important thing to see here is the use of brackets to denote the “where” clause. The query lists all elements that match the path provided (which represent individual track entries) and then further filters them based on the value of the track’s creator element—returning only those whose value is equal to the value provided.
But it doesn’t stop there. You can use any of the standard comparison operators including =, <, >, <=, and >=. You can negate any of them by prefacing them with an exclamation point. For example, to specify "not equal" you would use !=.
So this example would then return all track entries except those recorded by Breaking Benjamin. Pretty cool, huh? Okay, so you’re still not that impressed. Fine, let’s make it a little more complicated. Let’s filter our results based on more than one field.
Set colTracks = objXmlDoc.selectNodes( _
"/playlist/trackList/track " _
& "[creator = 'Breaking Benjamin' and title != 'So Cold']")
You can specify more than one element to filter by combining them into more complex query statements inside of the brackets. In this case, I’ve used the “and” operator to return all tracks where the creator element contains “Breaking Benjamin” and the title element does not contain “So Cold.” You can also create converse statements by using the “or” operator.
I bet right about now you’re wondering whether or not you can combine all of these techniques to create even more complex queries. In short, the answer is: yes. What fun would it be if you couldn’t do that?!
This query combines the two techniques we’ve seen already by first filtering results based upon the creator element, and then only returning the specified title element.
Wait, I know your next question. And yes, you are exactly right. You can specify more than one specific field even when using a complex query like this.
This query first returns all elements that match the path <playlist><trackList><track>. Then it filters those results and returns only those whose creator is “Breaking Benjamin.” Finally, it returns only the title and album elements for those tracks.
Take a breath. It looks a whole lot harder than it really is. Look at the query one piece at a time and think about how it filters the results. Each piece builds upon the results from the piece before. It’s a pretty logical process. You’ll get a handle on it in no time.
It’s okay if you need to read through this section again. These queries are a little unusual to look at when you first begin. The best, and easiest, way to learn them is to grab an XML file and just start playing. I promise, if you try them out and watch the results, you’ll pick this up in no time at all.
Scripting the XML Parser is pretty straightforward as far as controls go, but that doesn’t mean that you won’t run into any problems. So I decided to take a moment and talk about the most common problem you may run into.
As you’re trying these examples you may be wondering, “Why can’t I load my document?”
The most common pitfall associated with the XML Parser is a document not loading. In other words, your script doesn’t return any errors, but there doesn’t seem to be any information available. There’s no document tree.
If your document isn’t loading, you should first double check the path to your XML file. The XML Parser’s load method will not generate a VBScript error if it cannot find the document. If you’re positive that the path is correct, your problem may lie a little deeper.
Remember that we’re actually creating an XML Document object using the XML Parser. The Parser expects your XML file to contain valid XML code. The Parser actually validates the code before returning an XML Document object. This means that if your XML file will not validate, the parser will not return a reference to it.
While you can selectively disable this validation check, I’m not going to show you how. It’s just a bad idea all around to attempt using an XML file that doesn’t contain validated code. It can produce highly unexpected results.
Luckily for us, the XML Parser does provide us an error object that can shed some light on why the file won’t validate in the first place.
If Not objXmlDoc.load("C:\Playlist.xml") Then
Set objParseError = objXmlDoc.parseError
With objParseError
strErrText = "The document failed to load." & vbCrLf & VbCrLf & _
"Error #: " & .errorCode & ": " & .reason & _
"Line #: " & .Line & VbCrLf & _
"Line Position: " & .linepos & vbCrLf & _
"Position In File: " & .filepos & vbCrLf & _
"Source Text: " & .srcText & vbCrLf & _
"Document URL: " & .url
End With
MsgBox strErrText, vbExclamation
WScript.Quit
End If
This code sample demonstrates how you can wrap the Load method to determine whether a document was loaded successfully. If it wasn’t, it uses the XML Parser’s ParseError object to display a message box containing more detailed error information. This will display the same messages you would see if you tried to view your XML file in Internet Explorer.
I certainly don’t have the space to document every possible error here, but you should have enough information to find the problem on your own. Most typically, the problem will be a malformed XML file. You may be using illegal characters such as the ampersand instead of using their HTML entity.
As you can see, WSH does allow you to use XML as a viable database source. So why is this becoming so popular? First off, XML is a flat text file so it is very portable and cross-platform compliant. Second, it doesn’t require any proprietary database engines.
In any case, whatever your own reasons might be for using XML files, you should be well on your way. Until next time, keep coding!