XML Processing With The XMLReader Object, Part 2 - To DTD or Not to DTD
(Page 3 of 6 )
Legacy is bitter reality and so, while XML Schemas are the way forward as far as validation is concerned, don't be surprised when you come across a DTD or two in the XML framework that you are using. In such situations, you'll also need to know how you can use a DTD to validate an XML document instance.
Here's the updated XML file -- notice it now includes a reference to a DTD instead of an XML Schema:
<?xml version='1.0'? >
<!DOCTYPE library SYSTEM "library.dtd">
<library>
<book id="MFRE001">
<title>XML and PHP</title>
<author>Vikram Vaswani</author>
<description>Learn to manage your XML data with PHP</description>
<price currency="USD">24.95</price>
</book>
<book id="MFRE002">
<title>MySQL - The Complete Reference</title>
<author>Vikram Vaswani</author>
<description>Learn everything about this open source database</description>
<price currency="USD">45.95</price>
</book>
</library>
This brings us to the actual beast -- the library.dtd DTD file:
<!ELEMENT library (book+)>
<!ELEMENT book (title,author,description,price)>
<!ATTLIST book id CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ATTLIST price currency CDATA #REQUIRED>
A close look at this file and you will see that it describes the structure of the XML document instance fairly well. Of course, in between all the element and attributes are quaint symbols and keywords that will make sense only to DTD experts (if you don't belong to that elite group, you can start with the reference links provided at the end of this article).
And to complete this jigsaw, we have the ASP.NET code that uses the XmlValidatingReader object to tst the XML document instance against the DTD, as shown below:
<%@ Page Language="C#" Debug="true" %>
<%@ Import namespace="System.Xml"%>
<%@ Import namespace="System.Xml.Schema"%>
<html>
<head>
<script runat="server">
Boolean blnValidationSuccess = true;
void Page_Load() {
// define variables
string strXmlFile = http://localhost:2121/xmlpull/library.xml;
// initialize the XML readers
// and set the ValidationType
XmlTextReader objXmlTxtRdr = new XmlTextReader(strXmlFile);
XmlValidatingReader objXmlValRdr = new XmlValidatingReader(objXmlTxtRdr);
// set the validation type
objXmlValRdr.ValidationType = ValidationType.DTD;
// set the validation event handler
objXmlValRdr.ValidationEventHandler += new ValidationEventHandler
(ValidationMonitor);
// some output
output.Text = "Validating file: <b>" + strXmlFile.ToString() + "</b><br>";
// read XML data
while (objXmlValRdr.Read()){
String strSpaces;
// only process the elements, ignore everything else
if(objXmlValRdr.NodeType==XmlNodeType.Element) {
// reset the variable for a new node
strSpaces = "";
for(int count = 1; count <= objXmlValRdr.Depth; count++) {
strSpaces += "===";
}
output.Text += strSpaces + "=> " + objXmlValRdr.Name + "<br/>";
}
}
output.Text += "Validation <b>" + (blnValidationSuccess == true ?
"successful" : "failed") + ".</b>";
objXmlValRdr.Close();
objXmlTxtRdr.Close();
}
// display the validation errors.
void ValidationMonitor (object sender, ValidationEventArgs args)
{
blnValidationSuccess = false;
output.Text += "<i>Validation Error: " + args.Message + "</i><br>";
}
</script>
</head>
<body>
<asp:label id="output" runat="server"/>
</body>
</html>
When you test this code, you'll see that the XML document instance is successfully validated against the "library.dtd" file:

Now, once again, let me spoil things by introducing a rogue <inventory> element into the XML:

As you can see, the XmlValidatingReader object is quick to complain about the presence of the unwanted <inventory> element on the basis of the definitions present in the accompanying library.dtd file.
So what makes this script click? To be frank, the code hasn't changed much from my previous example. The major difference lies in the ValidationType property of the XMLValidatingReader object; I have updated it to use a DTD instead of an XML Schema, as shown below:
<%
// snip
// set the validation type
objXmlValRdr.ValidationType = ValidationType.DTD;
// snip
%>
And to make things more interesting, I have added some code to the Read() function to prove that you need not leave it blank -- a while loop now prints the names of elements to the console:
<%
// snip
// read XML data
while (objXmlValRdr.Read()) {
String strSpaces;
// only process the elements, ignore everything else
if(objXmlValRdr.NodeType == XmlNodeType.Element) {
// reset the variable for a new node
strSpaces = "";
for(int count = 1; count <= objXmlValRdr.Depth; count++) {
strSpaces += "===";
}
output.Text += strSpaces + "=> " + objXmlValRdr.Name + "<br/>";
}
}
// snip
%>
It is interesting to note here that the XMLValidatingReader will continue to read the XML data even if it encounters an error - which is why it becomes critical to ensure that you devise your very own escape route to get out of erroneous situations.
Next: Of Nodes and Trees >>
More XML Articles
More By Harish Kamath (c) Melonfire