This article, the first of three parts, explains what MSXML is and how to access an XML document using JavaScript. It is excerpted from chapter 10 of XML DeMYSTified, written by Jim Keogh and Ken Davidson (McGraw-Hill/Osborne, 2005; ISBN: 0072262109).
You combine the power of XML and programming languages such as JavaScript, Visual Basic, and C++ when you use Microsoft’s XML Core Services, simply referred to as MSXML. MSXML is an application program interface that contains features that enable you to interact with XML from within an application written in one of the commonly used programming languages.
This means that you can unleash an XML document from within a program rather than having to use a web browser.You can easily integrate any XML document into your application by calling features of MSXML from within your program.
You’ll learn about MSXML in this chapter and how to access an XML document using JavaScript. The same basic principles used for JavaScript can be applied to other programming languages.
What Is MSXML?
XML is a dynamic approach to managing information. As you’ve learned throughout this book, you can access an XML document using an XML-enabled browser. This is fine if you want to display all or a portion of an XML document. Simply follow the directions we present in this book and you’re able to view information contained in the XML document from your browser.
However, accessing an XML document using an application other than a browser can be tricky because code must be written within the application to extract information contained in the XML document.
Fortunately, Microsoft provides the magic wand to take the pain out of writing code to access an XML document from within an application with Microsoft XML Core Services—MSXML for short. MSXML consists of preprogrammed classes and functions that contain code to access and manipulate information in an XML document.
You don’t have to write the tedious code to read and parse an XML document because Microsoft has done this for you. All you need to do is to call the appropriate classes or functions within your application to work with an XML document.
MSXML is designed for a variety of programming languages, including C, C++, Visual Basic, VBScript, Jscript, and JavaScript. You can download the MSXML API at http://msdn.microsoft.com/xml/default.aspx, and will need to do so before you can use the examples we illustrate in this chapter.
We use JavaScript as the programming language for this chapter because you don’t need to use a compiler to create a JavaScript application. You simply write the code using the same editor that you use to write your web page. JavaScript is executed by calling the JavaScript from a web page using your browser.
We’ll show you a few JavaScript basics in this chapter—enough so you can get started using MSXML. However, you may want to read JavaScript Demystified by Jim Keogh (McGraw-Hill Osborne Media, 2005) to become proficient using JavaScript.
You’ll need to install the MSXML API or download it from the Microsoft web site. We’re using version 4.0; however, you should download the latest release.
Let’s jump in. To start learning MSXML, you’ll first create an XML document. The XML document is a catalog of CDs that we’ll simply call catalog.xml. It contains seven CDs, as you’ll see in the code that follows. Enter this XML code into a file and save it to your drive. Be sure to call the file catalog.xml.
<?xml version="1.0"?> <!DOCTYPE catalog SYSTEM "catalog.dtd"> <catalog> <cd upc="602498678299"> <artist>U2</artist> <title>How to Dismantle an Atomic Bomb</title> <price>13.98</price> <label>Interscope Records</label> <date>2004-11-23</date> </cd> <cd upc="75679244222"> <artist>Led Zeppelin</artist> <title>Physical Graffiti</title> <price>22.99</price> <label>Atlantic</label> <date>1994-08-16</date> </cd> <cd upc="75678367229"> <artist>Rush</artist> <title>Rush in Rio</title> <price>13.98</price> <label>Atlantic</label> <date>2003-10-21</date> </cd> <cd upc="74646938720"> <artist>Billy Joel</artist> <title>Songs in the Attic</title> <price>10.99</price> <label>Sony</label> <date>1998-10-20</date> </cd> <cd upc="75678263927"> <artist>Led Zeppelin</artist> <title>Houses of the Holy</title> <price>10.98</price> <label>Atlantic</label> <date>1994-07-19</date> </cd> <cd upc="8811160227"> <artist>Jimi Hendrix</artist> <title>Are You Experienced?</title> <price>12.99</price> <label>Experience Hendrix</label> <date>1997-04-22</date> </cd> <cd upc="74640890529"> <artist>Bob Dylan</artist> <title>The Times They Are A-Changin'</title> <price>9.99</price> <label>Sony</label> <date>1990-10-25</date> </cd> </catalog>
You’ll notice that the XML document refers to the catalog.dtd. As you’ll recall from Chapter 3, a DTD file contains the document type definition that defines the markup tags that can be used in the XML document and specifies the parent-child structure of those tags. The XML parser references the DTD when parsing elements of the XML document.
Create a DTD for this example. You do this by writing the following information into a file and saving the file as catalog.dtd in the directory that contains the catalog .xml file.
<!ELEMENT catalog (cd*)> <!ELEMENT cd (artist, title, price, label, date)> <!ELEMENT artist (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT price (#PCDATA)> <!ELEMENT label (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ATTLIST cd upc CDATA #REQUIRED>
The final step you’ll take to prepare to learn MSXML is to create the HTML file that contains the JavaScript used to access the catalog.xml document. The HTML file follows. Some of it is familiar because it’s HTML. Other parts, you’ll understand if you know JavaScript (don’t worry if you don’t understand them; we explain JavaScript throughout this chapter). However, the portions of the HTML file that use MSXML are probably confusing, even if you previously worked with JavaScript.
For now, simply create this HTML file and save it to a file called default.html in the directory where you saved catalog.xml and catalog.dtd. We explain each part of the HTML file throughout this chapter.
<html> <head> <script language="javascript"> var objXML; function LoadDocument() { var inputfile = document.all("inputfile").value; objXML = new ActiveXObject("MSXML2.DOMDocument.4.0"); objXML.async = false; objXML.load(inputfile); if (objXML.parseError.errorCode != 0) { alert("Error loading input file: " + objXML.parseError.reason); return; } document.all("xmldoc").value = objXML.xml; } function InsertFirst() { var objNewNode = LoadNewNode(); if(objNewNode == null) { return; } var root = objXML.documentElement; root.insertBefore(objNewNode, root.firstchild); document.all("xmlresult").value = objXML.xml; } function InsertLast() { var objNewNode = LoadNewNode(); if(objNewNode == null) { return; } var root = objXML.documentElement; root.appendChild(objNewNode); document.all("xmlresult").value = objXML.xml; } function InsertBefore(upc) { var objNewNode = LoadNewNode(); if(objNewNode == null) { return; } var root = objXML.documentElement; var objNodes = objXML.selectNodes("/catalog/cd[@upc='" + upc + "']"); if(objNodes.length == 0) { alert("Could not find node with upc " + upc); return; } root.insertBefore(objNewNode, objNodes.item(0)); document.all("xmlresult").value = objXML.xml; } function InsertAfter(upc) { var objNewNode = LoadNewNode(); if(objNewNode == null) { return; } var root = objXML.documentElement; var childNodes = root.childNodes; for(var i=0; i < childNodes.length; i++) { var node = childNodes.item(i); var nodeUPC = node.getAttribute("upc"); if(nodeUPC == upc) { root.insertBefore(objNewNode, childNodes.item(i+1)); document.all("xmlresult").value = objXML.xml; return; } } alert("Could not find node with upc " + upc); } function LoadNewNode() { var xmlNewNode = document.all("newnode").value; var objNewNode = new ActiveXObject("MSXML2.DOMDocument.4.0"); objNewNode.async = false; objNewNode.loadXML(xmlNewNode); if (objNewNode.parseError.errorCode != 0) { alert("Error loading new node: " + objNewNode.parseError.reason); return null; } else { return objNewNode.documentElement; } } function CreateAndAppendNode() { var upc = document.all("createUpc").value; var artist = document.all("createArtist").value; var title = document.all("createTitle").value; var price = document.all("createPrice").value; var label = document.all("createLabel").value; var date = document.all("createDate").value; var elementCd = objXML.createElement("cd"); elementCd.setAttribute("upc", upc); var elementArtist = objXML.createElement("artist"); var textArtist = objXML.createTextNode(artist); elementArtist.appendChild(textArtist); elementCd.appendChild(elementArtist); var elementTitle = objXML.createElement("title"); var textTitle = objXML.createTextNode(title); elementTitle.appendChild(textTitle); elementCd.appendChild(elementTitle); var elementPrice = objXML.createElement("price"); var textPrice = objXML.createTextNode(price); elementPrice.appendChild(textPrice); elementCd.appendChild(elementPrice); var elementLabel = objXML.createElement("label"); var textLabel = objXML.createTextNode(label); elementLabel.appendChild(textLabel); elementCd.appendChild(elementLabel); var elementDate = objXML.createElement("date"); var textDate = objXML.createTextNode(date); elementDate.appendChild(textDate); elementCd.appendChild(elementDate); var root = objXML.documentElement; root.appendChild(elementCd); document.all("xmlresult").value = objXML.xml; } function SelectArtist(artist) { var objNodes = objXML.selectNodes ("/catalog/cd[artist='" + artist + "']"); if(objNodes.length == 0) { alert("Could not find artist with name " + artist); return; } var root = objXML.documentElement; var cdList = root.selectNodes("/catalog/cd"); cdList.removeAll(); for(var i=0; i < objNodes.length; i++) { root.appendChild(objNodes.item(i)); } document.all("xmlresult").value = objXML.xml; } function DisplayTitles() { var result = ""; var objNodes = objXML.selectNodes("/catalog/cd/title"); for(var i=0; i < objNodes.length; i++) { result += objNodes.item(i).text + "\r\n"; } document.all("xmlresult").value = result; } function DeleteNodes(upc) { var objNodes = objXML.selectNodes("/catalog/cd[@upc='" + upc + "']"); if(objNodes.length == 0) { alert("Could not find node with upc " + upc); return; } for(var i=0; i < objNodes.length; i++) { objXML.documentElement.removeChild(objNodes.item(i)); } document.all("xmlresult").value = objXML.xml; } function ValidateDocument() { var err = objXML.validate(); if (err.errorCode == 0) { alert("Document is valid."); } else { alert("Error validating document:" + err.reason); } } function TransformDocument(stylesheet) { var xslProcessor; var xslTemplate = new ActiveXObject( "Msxml2.XSLTemplate.4.0"); var xslDocument = new ActiveXObject( Msxml2.FreeThreadedDOMDocument. 4.0"); xslDocument.async = false; xslDocument.loadXML(stylesheet); if (xslDocument.parseError.errorCode != 0) { var myErr = xmlDoc.parseError; alert("You have error " + myErr.reason); return; } xslTemplate.stylesheet = xslDocument; xslProcessor = xslTemplate.createProcessor(); xslProcessor.input = objXML; xslProcessor.transform(); window.frames.htmlresult.document.open(); window.frames.htmlresult.document.clear(); window.frames.htmlresult.document.write(xslProcessor.output); window.frames.htmlresult.close(); } </script> </head> <body onload="LoadDocument();"> <table cellpadding="5" class=contentpaneopen> <tr> <td nowrap>File name: <input type="text" id="inputfile" value="catalog.xml"></td> <td><input type="button" onclick="LoadDocument();" value="Load Document"></td> </tr> <tr valign="top"> <td>XML Document:</td> <td><textarea id="xmldoc" rows="20" cols="80" readonly> </textarea></td> </tr> <tr valign="top"> <td nowrap> <a href="#" onclick="InsertFirst(); return false;"> Insert First:</a><br> <a href="#" onclick="InsertLast(); return false;"> Insert Last:</a><br> <a href="#" onclick="InsertBefore( document.all('upcBefore').value); return false;">Insert Before UPC:</a> <input type="text" id="upcBefore" value="75678367229" size="15"><br> <a href="#" onclick= "InsertAfter(document.all('upcAfter').value); return false;">Insert After UPC:</a> <input type="text" id="upcAfter" value="75678367229" size="15"><br> </td> <td><textarea id="newnode" rows="10" cols="80"> <cd upc="75596280822"> <artist>Phish</artist> <title>Live Phish, Vol. 15</title> <price>26.99</price> <label>ELEKTRA/WEA</label> <date>2002-10-29</date> </cd> </textarea> </td> </tr> <tr valign="top"> <td nowrap><a href="#" onclick="CreateAndAppendNode(); return false">Create/Append Node</a></td> <td nowrap> upc: <input type="text" id="createUpc" value="75596280822" size="15"><br> artist: <input type="text" id="createArtist" value="Phish" size="15"><br> title: <input type="text" id= "createTitle" value="Live Phish, Vol. 15" size="15"><br> price: <input type="text" id="createPrice" value="26.99" size="15"><br> label: <input type="text" id="createLabel" value="ELEKTRA/WEA" size="15"><br> date: <input type="text" id="createDate" value="2002-10-29" size="15"> </td> </tr> <tr valign="top"> <td colspan="2" nowrap> <a href="#" onclick=" SelectArtist(document.all('artist').value); return false;">Select Artist:</a> <input type="text" id="artist" value="U2" size="15"><br> <a href="#" onclick="DisplayTitles(); return false;">Display Titles</a><br> <a href="#" onclick= "DeleteNodes(document.all('upcDelete').value); return false;">Delete Nodes w/UPC:</a> <input type="text" id="upcDelete" value="75678367229" size="15"><br> <a href="#" onclick="ValidateDocument(); return false;">Validate Document</a> </td> </tr> <tr valign="top"> <td nowrap><a href="#" onclick="TransformDocument(document.all('stylesheet').value); return false;">Transform Document:</a></td> <td> <textarea id="stylesheet" rows="20" cols="80"> <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/ Transform"> <xsl:template match="/"> <html> <body> <h2>CD Listing</h2> <table border="1"> <tr> <th align="center">UPC</th> <th align="center">Artist</th> <th align="center">Title</th> </tr> <xsl:for-each select="catalog/cd"> <tr> <td> <xsl:value-of select="@upc"/> /td> <td> <xsl:value-of select="artist"/> </td> <td> <xsl:value-of select="title"/> </td> </tr> </xsl:for-each> </table> </body> </html> </xsl:template> </xsl:stylesheet> </textarea> </td> </tr> <tr valign="top"> <td>XML Result:</td> <td><textarea id="xmlresult" rows="20" cols="80"></textarea></td> </tr> <tr valign="top"> <td>HTML Result:</td> <td><iframe id="htmlresult" src="about:blank" width="100%" height="300"></td> </tr> </table> </body> </html>
Let’s begin by loading the XML document from the file system into the browser. You accomplish this by entering the name of the XML document into the File name: input field on the HTML form and then selecting the button to refresh the document. These two lines of code within the HTML document create these elements:
The first line creates the input field, and the second line creates the button.
Look at the opening <body> tag on the HTML document and you’ll see that you tell the browser to call the LoadDocument() JavaScript function each time that the HTML page is loaded into the browser. This causes the browser to load the default file and display it in the text area of the web page.
<body onload="LoadDocument();">
Notice that the onclick attribute of the input button also calls the LoadDocument() function when the button is selected. This time the LoadDocument() function loads the file that’s named in the File name: input box, which is then displayed in the text area of the web page replacing the current file. You may want to use this button periodically to refresh the XML document to its original state.
The LoadDocument() Function
A function is a piece of code that contains one or more lines of code that execute only if the function is called by another part of the application. Each function has a unique name that’s used to call it. A function is defined before it’s called. You’ll notice that the LoadDocument() function is defined at the beginning of the HTML file.
LoadDocument() is a JavaScript function that loads a document. Here’s what it looks like:
var objXML; function LoadDocument() { var inputfile = document.all("inputfile").value; objXML = new ActiveXObject("MSXML2.DOMDocument.4.0"); objXML.async = false; objXML.load(inputfile); if (objXML.parseError.errorCode != 0) { alert("Error loading input file: " + objXML.parseError.reason); return; } document.all("xmldoc").value = objXML.xml; }
There are two components shown in this example. The first is objXML. This is a variable. Think of a variable as a placeholder for a real value. The objXML is a global variable defined outside the function definition, which means that it can be accessed from anywhere in the application. In contrast, inputfile is a local variable to the LoadDocument() function and is only accessible from within the LoadDocument() function definition.
The second component in this example is the function definition. The function is called LoadDocument(). Code between the French braces ({ }) executes each time another part of the application calls the LoadDocument() function.
The first line in the LoadDocument() function definition accesses the value of the inputfile input box on the HTML form. This is the input box containing the name of the document to load. The value is the name of the document. This file name is assigned to a variable called inputfile.
The second line assigns the objXML variable to an instance of the MSXML DOM Object. This function begins by finding out which file to load, which is then stored to the inputfile. Next, you create an ActiveX object for the DOM parser (see Chapter 7). The version number is supplied because MSXML is designed to coexist with previous versions rather than replace a previous version with the latest version.
TIP Visual Basic, VBScript, C, and C++ access objects using either the ActiveX or COM interface.
The third line determines if the file is being accessed synchronously or asynchronously. The DOMDocument object contains properties and functions (sometimes called methods). One of those properties is called async; it controls how the document is going to behave with your application. By setting the async property to false, you’re saying that you want to wait until the document is loaded before executing the next line of code. If you set the async property to true, then the next line of code executes while the document is still loading.
The fourth line calls the load() method, which is defined in the MSXML API. Notice that the inputfile variable is placed between the parentheses of the load() function. This is referred to as passing a variable. In other words, you’re passing the name of the file that you want the load() function to load. The file name is the URL to the document. You can replace this with any valid URL to load the document.
The fifth line checks for errors to make sure the document loaded properly. This is done by using an if statement. An if statement evaluates a condition. If the condition is true, then code within the French braces is executed; otherwise the code is skipped. In this example, the if statement determines if an error occurred opening the file. If so, then an error message is displayed and the function is terminated. If not, then the application skips to the line following the closed French brace (}). The DOMDocument object has a property called parseError that contains details of any errors that might have occurred. This is an instance of the IXMLDOMParseError object. It checks if the errorCode is not zero, which means an error occurred. If so, then the error message is displayed on the screen.
The sixth line displays the XML document in the text area of the HTML page. Look carefully and you’ll notice that the line references the XML property of the objXML variable. Remember that the objXML variable references the DOMDocument. The XML property of the DOMDocument contains the XML representation of the DOMDocument. Remember, the DOM is a tree type structure. The XML property essentially serializes the DOM back to its familiar markup form.
You’ll need to use four different functions to determine where to place the new CD within the XML document. These functions are
InsertFirst() Put the new entry at the beginning of the list
InsertLast() Put the new entry at the end of the list
InsertBefore() Put the new entry before the CD with the given upc attribute
InsertAfter() Put the new entry after the CD with the given upc attribute
Each function is called by an option on the HTML form. Options appear in the first column of the table. The user of the application decides the position of the new CD within the XML document by selecting the appropriate option.
The first two options place the new CD at the beginning or at the end of the XML document, respectively. The last two options require the user to specify a UPC. The UPC is the identifier for a CD that’s already in the XML document. The function then places the new CD either before or after the CD that the user specifies.
The second column contains a text area containing information about the new CD. We’ve provided a default value when the page loads, but you can change this in the browser. Each function references the text area value when inserting the new CD into the XML document.
The InsertFirst(), InsertLast(), InsertBefore(), and InsertAfter() functions must retrieve information about the new CD from the text area. This is done by calling the LoadNewNode() method. The LoadNewNode() method loads information about the new CD from the text area into the DOM parser and then returns a reference to the root node of the information about the new CD to one of the four functions that called it. Here’s the LoadNewNode() method:
function LoadNewNode() { var xmlNewNode = document.all("newnode").value; var objNewNode = new ActiveXObject("MSXML2.DOMDocument.4.0"); objNewNode.async = false; objNewNode.loadXML(xmlNewNode); if (objNewNode.parseError.errorCode != 0) { alert("Error loading new node: " + objNewNode.parseError.reason); return null; } else { return objNewNode.documentElement; } }
The first line retrieves text from the text area on the HTML form.
The second line creates a new DOMDocument object that contains information about the new CD.
The third line sets the value for the async property to false so that the entire document loads before returning control to the calling point.
The fourth line calls the loadXML() method of the DOMDocument object. The loadXML() method works similarly to the load() method called within the LoadDocument() function except the loadXML() method is used when the argument is a string. In this case, you’re passing the actual XML document as an argument instead of passing a URL that points to the document.
The fifth line checks if an error occurred when loading information about the new CD. If there is an error, then an error message is displayed. If there isn’t an error, then the value of the documentElement of the DOMDocument is returned to the statement that called the LoadNewNode() method. The documentElement is the root element of the document, which is a reference to the <cd> element and all its child elements.
Please check back next week for the second part of this article.