Extracting Metadata - Exploring the Structure of an XSD
(Page 12 of 22 )
An XSD schema defines elements. The elements may contain other elements. To show how these elements work to describe the data structure, I’ll walk through the XSD schema created by XSD.EXE from the Customers and Orders table of the Northwind database. The root element represents the DataSet and child elements representing the Customers and Orders tables of Northwind.
The opening of the XSD shows that it’s just an XML file with a specific set of namespaces. It contains the XML header, a single root element, and a series of defined namespaces. In this case, the name of the DataSet is “DataSet1”:
<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="DataSet1"
targetNamespace="http://tempuri.org/DataSe1t.xsd"
elementFormDefault="qualified" attributeFormDefault="qualified"
xmlns="http://tempuri.org/DataSe1t.xsd"
xmlns:mstns="http://tempuri.org/DataSet1.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml msdata">
NOTE: Namespaces are one of the most difficult things to work with correctly in XML and .NET processing of XML. Appendix A covers some of the nuances of namespaces. |
The xmlns:xs="http://www.w3.org/2001/XMLSchema" defines this file as an XSD. Among the other things this namespace provides, you’ll find that if you edit the XSD within Visual Studio .NET, IntelliSense works because Visual Studio maps this namespace to a schema file it can use for IntelliSense.
Complex types contain child elements. Elements representing the DataSet and DataTables are complex types:
<xs:element name="DataSet1" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Customers">
<xs:complexType>
<xs:sequence>
The table elements contain child elements representing columns. Columns are simple types that have attributes describing their name, type, and other information. Simple types don’t contain child elements:
<xs:element name="CustomerID" type="xs:string" />
<xs:element name="CompanyName" type="xs:string" />
<xs:element name="ContactName" type="xs:string"
When you combine several elements in a complex type, such as tables within a DataSet or columns within a table, you can restrict them to a specific order or let them appear in any order. If the order of a set of elements must match a defined order, then it’s an xs:sequence . Only one element of a set of child elements defined using xs:choice can normally appear in a document. However, by adding the maxOccurs="unbounded" to the element, it may occur as many times as needed. Thus, tables within a DataSet can appear in any order, and can appear only when needed. The tables of a DataSet are defined using xs:choice because they can occur in any order. The columns within a table are defined using xs:sequence because if a column occurs, it must be in a specific sequence:
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Orders">
<xs:complexType>
<xs:sequence>
<xs:element name="OrderID" msdata:ReadOnly="true"
msdata:AutoIncrement="true" type="xs:int" />
<xs:element name="CustomerID" type="xs:string"
minOccurs="0" />
<xs:element name="EmployeeID" type="xs:int"
minOccurs="0" />
<xs:element name="OrderDate" type="xs:dateTime"
minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
Nullable columns have the minOccurs="0" attribute. This specifically means that having no occurrences of the element is legal. XML handles nulls by leaving out the entry. This is a little surprising if you’re expecting all of the elements within a logical XML record to appear. It can also cause issues during inference.
This XSD contains three constraints. Two are primary key constraints, one on each table in the DataSet. These primary key constraints just define the table and the primary key columns:
<xs:unique name="DataSet1Key1" msdata:PrimaryKey="true">
<xs:selector xpath=".//mstns:Customers" />
<xs:field xpath="mstns:CustomerID" />
</xs:unique>
<xs:unique name="DataSet1Key2" msdata:PrimaryKey="true">
<xs:selector xpath=".//mstns:Orders" />
<xs:field xpath="mstns:OrderID" />
</xs:unique>
A referential constraint defines the relation between two tables. XSD describes this relationship by referencing the existing primary key constraint for the parent table, such as DataSet1Key1. This is the constraint for the parent key. The child table of the relation is defined by its table and column, both stated in terms of an XPath statement:
<xs:keyref name="CustomersOrders" refer=" DataSet1Key1">
<xs:selector xpath=".//mstns:Orders" />
<xs:field xpath="mstns:CustomerID" />
</xs:keyref>
</xs:element>
</xs:schema>
Although there are some additional details, that’s the core of understanding XSDs. XSD is an ugly, messy format that’s both verbose and effective. It’s designed for maximum flexibility across a lot of different kind of data. As the complexity of the data structure increases, obviously the complexity of the XSD increases, and it becomes even more difficult to read.
This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now. |
Next: Working with SQL-92 Databases (SQL Server) >>
More Database Articles
More By Apress Publishing