Introduction to XML for Database Developers - Processing Instructions
(Page 2 of 9 )
An XML document often starts with a tag that is called a processing instruction. For example, the following processing instruction notifies the reader that the document it belongs to is written in XML that complies with version 1.0:
<?xml version="1.0"?>
A processing instruction has the following format:
<?name data?>
The name portion identifies the processing instruction to the application that is processing the XML document. Names must start with xml. The data portion that follows is optional and includes information that may be used by the application.
TIP
Although it is not required, it is recommended that you start an XML document with a processing instruction that explicitly identifies that document as an XML document defined using a specified version of the standard.
Document Type Definition As mentioned earlier, markups are meaningless if defining rules for the following is not possible:
- What constitutes a markup
- What a markup means
A Document Type Definition (DTD) is a type of document that is often used to define such rules for XML documents. The DTD contains descriptions and constraints (naturally, not Transact-SQL constraints) for each element (such as the order of element attributes and membership). User agents can use the DTD file to verify that an XML document complies with its rules.
The DTD can be an external file that is referenced by an XML document:
<!DOCTYPE Inventory SYSTEM "Inventory.dtd">
or it can be part of the XML document itself:
<?xml version="1.0"?>
<!DOCTYPE Inventory[
<!ELEMENT Inventory (Asset+)>
<!ELEMENT Asset (EquipmentId, LocationId, StatusId, LeaseId,
LeaseScheduleId, OwnerId, Cost, AcquisitionTypeID)>
<!ATTLIST Asset Inventoryid CDATA #IMPLIED>
<!ELEMENT EquipmentId (#PCDATA)>
<!ELEMENT LocationId (#PCDATA)>
<!ELEMENT StatusId (#PCDATA)>
<!ELEMENT LeaseId (#PCDATA)>
<!ELEMENT LeaseScheduleId (#PCDATA)>
<!ELEMENT OwnerId (#PCDATA)>
<!ELEMENT Cost (#PCDATA)>
<!ELEMENT AcquisitionTypeID (#PCDATA)>
]>
<Inventory>
<Asset Inventoryid="5">
<EquipmentId>1</EquipmentId>
<LocationId>2</LocationId>
<StatusId>1</StatusId>
<LeaseId>1</LeaseId>
<LeaseScheduleId>1</LeaseScheduleId>
<OwnerId>1</OwnerId>
<Cost>1295.00</Cost>
<AcquisitionTypeID>1</AcquisitionTypeID>
</Asset>
</Inventory>
The DTD document does not have to be stored locally. A reference can include a URL or URI that provides access to the document:
<!DOCTYPE Inventory SYSTEM http://www.trigonblue.com/dtds/Inventory.dtd>
A Uniform Resource Identifier (URI) identifies a persistent resource on the Internet. It is a number or name that is globally unique. A special type of URI is a Uniform Resource Locator (URL) that defines a location of a resource on the Internet. A URI is more general because it should find the closest copy of a resource and because it would eliminate problems in finding a resource that was moved from one server to another.
NOTE
In some cases, it is not important that a URI points to a specific resource, but the string that is supplied must be globally unique, meaning no other XML document (that can be merged with the current XML document) is using the same string for some other resource. However, there are also cases in which a URI points to a specific resource on the Internet and the content of the string is critical for proper processing of an XML document.
XML Comments and CDATA sections It is possible to write comments within an XML document. The basic syntax of the comment is
<!--commented text-->
where commented text can be any character string that does not contain two consecutive hyphens (--) and that does not end with a hyphen (-).
Comments can stretch over more than one line:
<!-- This is a comment. -->
<!--
This is another comment.
-->
Comments cannot be part of any other tag:
<Order <!-- This is an illegal comment. --> OrderId = "123">
...
</Order>
You can use CDATA sections in XML documents to insulate blocks of text from XML parsers. For example, if you are writing an article about XML and you want also to store it in the form of an XML document, you can use CDATA sections to force XML parsers to ignore markups with sample XML code.
The basic syntax of a CDATA section is
<![CDATA[string]]>
The string can be any character string that does not contain the string ]]>.
CDATA sections can occur anywhere in an XML document where character data is allowed:
<Example>
<Text>
<![CDATA[<Inventory Inventoryid="12"/>]]>
</Text>
</Example>
Character and Entity References Like HTML and SGML, XML also includes a simple way to reference characters that do not belong to the ASCII character set. The syntax of a character reference is
&#dec-value;
&#xhex-value;
The decimal (dec-value) or hexadecimal (hex-value) code of the character must be preceded by &# or &#x, respectively, and followed by a semicolon (;).
Entity references are used in XML to insert characters that would cause problems for the XML parser if they were inserted directly into the document. This type of reference is basically a mnemonic alternative to a character reference. There are five basic entity references:
Entity | Meaning |
& | & |
' | ' |
< | < |
> | > |
" | " |
Entity references are often used to represent characters with special meaning in XML. In the following example, entity references are used to prevent the XML parser from parsing the content of the Text element:
<Example>
<Text>
<Inventory Inventoryid="12"/>
</Text>
</Example>
Next: XML Namespaces >>
More MS SQL Server Articles
More By McGraw-Hill/Osborne
|
This article was excerpted from chapter 13 of SQL Server 2000 Stored Procedure & XML Programming, second edition, written by Dejan Sunderic (McGraw-Hill/Osborne, 2004; ISBN: 0072228962). Check it out at your favorite bookstore today. Buy this book now.
|
|