Learn about XML and the hierarchical structure of the Document Object Model in this down and dirty piece! Nodes, NodeLists, NameNodeMaps, as well as properties such as parentNodes, childNodes, nodeNames, and nodeValues are explored, explained and code is given. "In order to represent the hierarchical nature of XML, the DOM provides a whole set of objects, methods and properties that allow us to manipulate the DOM. We will not be able to cover them all in this tutorial, but we’ll cover a few to give you the essence of the sort of things you can achieve."
Contributed by Gayathri Gokul Rating: / 18 February 09, 2004
The Document Object Model is an API for HTML and XML documents. It defines the logical structure of the documents, and the way they can be accessed. DOM gains its importance because it defines a standard way in which you can access and manipulate the XML structure. In short we can say DOM is a programming interface for XML documents and also defines the way an XML document can be accessed and manipulated. A simple illustration will help us understand about the XML document, and how the DOM can be used.
Example 1
<BookAuthors> <Author> <au_id>1001</au_id> <au_lname> Gates </au_name> <au_fname> Bill </au_name> </Author> <Author> <au_id>1002</au_id> <au_lname> Potter</au_name> <au_fname> Harry </au_name> </Author> </BookAuthors>
If you take a closer look you will be able to see that XML documents are always hierarchical in nature, which means they always have a top-level or root element and then child elements. So the above document could be represented as:
The tree would have been deeper, if there were more children. In DOM terms these elements are also called nodes. A node just represents a generic element in this tree-type structure.
In order to represent the hierarchical nature of XML, the DOM provides a whole set of objects, methods and properties that allow us to manipulate the DOM. We will not be able to cover them all in this tutorial, but we’ll cover a few to give you the essence of the sort of things you can achieve. First and foremost let’s see the DOM objects:
Object
Description
Node
A single node in the hierarchy
NodeList
A collection of Nodes
NameNodeMap
A collection of nodes allowing access by name as well as index.
There are vast numbers of DOM properties that allow us to traverse through the node. The following list gives a few properties. We will sample these DOM objects later.
Properties
Description
parentNode
Returns the parent of the current node.
childNodes
Returns a NodeList containing the children of the node.
firstChild
Returns the first child of the current node.
lastName
Returns the last child of the current node.
previousSibling
Returns the previous sibling, i.e. the previous node at the same level of the hierarchy.
nextSibling
Returns the next sibling, i.e. the next node at the same level of the hierarchy.
nodeName
Returns the name of the node.
nodeValue
Returns the value of the node.
To get the full list, check out MSDN online XML area at msdn.microsoft.com/xml/.
Now let us look at the node structure of our XML document with little more detail. We will examine one side of the document structure alone for ease of explanation. All this applies to the other side as well.
Here in Figure1, you can clearly see how you can use these properties to navigate around the XML DOM. The lines indicate which nodes the properties point to. The children on the root node, BookAuthor, are held in the childNode collection. In the above case BookAuthor only has one child, so both its firstChild and its lastChild properties point to the same node. In the above case which we are discussing, childNode(0 will apply. Since it is the only node in the collection.
The Author node however, has three children, held in a childNodes collection. The pointer to the au_id is that of firstChild property, which is the same as childNodes(0), and the lastchild property points to au_fname node. The previousSibling and nextSibling properties point to the next node collection at the same level. So let us assume we have a node named baRoot pointing to BookAuthors, the following table helps demonstrate the parent-child hierarchy.
XML, was designed to be eXtensible, data integration and data exchange is one of its key features. XML was anchored to cater to a tremendous variety of documents. Despite this there are no specific objects for different types of node. Really, what makes it so intriguing is that, they inherit most of the properties and methods of the Node objects as well as adding specific methods and properties relevant to the particular node type. The following table lists the specific DOM Objects:
Object
Description
Document
The root object for an XML document.
DocumentType
Stores info about DTD or Schema associated with the XML document.[For e.g. !DOCTYPE in a DTD]
DocumentFragment
A lightweight copy of the document. Useful for temporary storage or document insertions.
Element
An XML element.
Attribute or Attr
An XML attribute.
Entity
A parsed or unparsed entity.[E.g. !ENTITY in a DTD.]
EntityReference
An XML entity reference.
Notation
A notation.[e.g.!NOTATION in DTD]
CharacterData
The base object for text information in an XML document.
So, we are going to write a sample code quickly to see how the DOM traverses through the XML document, using the TravelXML.html. We are going to use Internet Explorer here, with XML Data Island, the data island is simply a HTML tag that acts like data control.
<xml ID= “diData” SRC = “BookAuthors.xml”></xml>
Above we have a data island named diData, containing data from the XML file BookAuthors.xml. Please note, data islands are like containers for data, they don’t actually show up on the screen. So we need to find a way to access the data from this and display it.
<SPAN ID = “txtData”></SPAN>
Our aim is to use DOM object to extract the XML info from the data island, and display the data in SPAN. We will start our work with the root node, and find any child nodes to that root node and display the details of the node. So we will display the name, type and value of the node, we will repeat the process for the child node because a child node can contain nodes of its own. We write a recursive function to use for this is a tree traversal code.
One major piece of information we are going to display is the node’s type. We will convert it into string in this case to make it readable. In order to do this we have to declare a global variable containing the text description of the node type and indexed by the actual node type number. The very beginning of the document will have the following code, well before the JScript code.
var ga_strNodeType
= new Array ( ‘ ’, ‘ELEMENT(1), ‘ATTRIBUTE (2)’, ‘TEXT (3)’, ‘CDATA SECTION (4), ‘ENTITY REFERENCE (5), ‘ENTITY (6)’ ‘DOCUMENT (9); ‘DOCUMENT TYPE (10), DOCUMENT FRAGMENT (11), ‘NOTATION (12)’ );
The recursive function that we will be calling is called displaychildNodes. This function will pair into parameters it accepts an XML node and an integer that indicates the current level of the node in the hierarchy.
function displayChildNodes
(baNode, intLevel) { var strNodes = ‘’; //a string variable containing the node //information. var intCount = 0; //an integer variable containing //the count of nodes var intNode = 0; //a integer variable containing current //number of node. var baAttrList = ‘’; //A node list of the attributes for //a particular node. //Building the string beginning from the //current node name, its type and value. //An integer is used to identify the type, //and the previously define array //ga_strNodeTypes is used to get the //description of node type. The getIndent //function returns a blank string containing //spaces up to the level in a tree. //To get value for this node strNodes + = getIndent(intLevel) + ‘<b>’ + baNode.nodeName + ‘</b> Value: <b>’ + baNode.nodeValue + ‘</b><br>’; //Use a loop to find out if the node has //any attributes, if so loop them, adding //their details to the string. strNodes + = getIndent(intLevel) + ‘<b>’ + baNode.nodeName + ‘</b> Value: <b>’ + baNode.nodeValue + ‘</b><br>’; //Use a loop to find out if the node //has any attributes, if so loop them, //adding their details to the string. baAttrList = baNode.attributes; If (baAttrList != null) { intCount = baAttrList.length; if (intCount > 0) { //for each attribute display the //attribute information. for(intAttr =0; intAttr < intCount; intAttr++) strNodes + = getIndent ( intLevel + 1 ) + ‘<b>’ + baAttrList(intAttr).nodeName + ‘</b> Type: <b>’ + ga_strNodeTypes[baAttrList(intAttr).nodeType] + ‘</b> Value: <b>’ + baAttrList(intAttr).nodeValue + ‘</b><br>’; } } //Finally we check for any child node, //and for each child node call the same function. intCount = nodAttrList.length; if (intCount > 0) { //for each child node display the child node //information. for(intNode =0; intNode < intCount; intNode++) strNodes + = showChildNodes(baNode.childNodes(intNode), intLevel +1); return strNodes; }
To display the output from the above code using DOM, you could use the following:
The above code calls the function, passing in the top-level node. Loading the TravelXML.html in the Internet Explorer (IE) you can see the output and it will look something like below:
<P class=MsoBodyText2 style="MARGIN: 0in 0in 0pt">#text Type Text(3) Value: Harry
Hope you got a clear and quick picture of the recursive nature of the XML DOM. At the top we have the #documentType node, which is an inherent parent, that means the root node of all XML documents. Pay careful attention though.It’s not actually an element-it has a type of DOCUMENT. So the root of the XML data is an XML document, but under that you have XML Elements.
In our case the first root element is the BookAuthors element. This in turn contains an element for each Authorand an element for each property of the Author. We also notice some additional information for each leaf node (i.e. node with no children).We have another node called #text. This actually contains the text of the node. You may ask then why does each element have a value of null and its sub-element called #text contains the value of the node. The answer is very simple.Some nodes may have both, another node, as well as contain text. If a node contains both text and other nodes, what will be the value?Will it be a text or the child node? This led W3C to specify that text for a node be always held in a child node of type Text.
So this leaves us with the final part of the code, for when we accessed the node we did not step deep down to another level in the tree to access the child. We will do it now!
Voila!We have used the nodeValue property, too. This is Microsoft’s simplified extension of DOM. W3C specified that to access the value of the node-you have to traverse to the child to access the associated TEXT node. Microsoft felt that such a common action as accessing the value of a node to get at the text it holds must be simplified, hence intelligently delegated it by introducing nodeValue property to handle the TEXT nodes.
This tutorial has shown you about DOM and how it stores XML data in a tree structure. Now that you understand how XML works and have been introduced to DOM, next we can take a look at how XML integrates with ADO.