Introduction to XML for Database Developers
(Page 1 of 9 )
Relational database management systems have been around since the days before the Internet, and Microsoft SQL Server is no exception. In its latest version, it is compatible with XML, which allows it to solve the old challenges faced by RDBM systems (and some new ones) in new ways. This article was excerpted from chapter 13 of
SQL Server 2000 Stored Procedure & XML Programming, second edition, written by Dejan Sunderic (McGraw-Hill/Osborne, 2004; ISBN 0072228962).
Microsoft SQL Server has become a giant among the select group of enterprise-ready relational database management systems, but as with those other RDBMSs, its roots are in pre-Internet solutions. The Internet revolution has highlighted a set of old tactical and strategic challenges for the Microsoft SQL Server development team. These challenges include the following:
- Storing the large amounts of textual information that web-based, user-friendly database applications require
- Delivering that textual (and other) stored information to the Web
- Sharing information with other departments and organizations that may not use the same RDBMS system
In earlier editions of SQL Server, Microsoft addressed these issues with features such as Full Text Search, the Web Publishing Wizard, DTS, ADO, and OLE DB. SQL Server 2000 introduces XML compatibility—the new holy grail of the computing industry and the latest attempt to tackle the same old problems.
--------------------------------------------------------------------
XML (R)evolution To communicate with customers in today’s rich-content world, you need to provide them with information. Until very recently, such information was inevitably encapsulated in proprietary, document-based formats that are not shared easily. For example, word processor documents are optimized for delivery on paper, and relational databases are often structured and normalized in formats unsuitable to end users.
The first step in the right direction was the Standard Generalized Markup Language (SGML). Although it was designed in the late 1960s (by Charles Goldfarb), it became the international standard for defining markup languages in 1986, after the creation of the ISO standard. In the late 1980s, companies and government agencies started to adopt this tag-based language. It allowed them to create and manage paper documentation in a way that was easy to share with others.
Then, in the 1990s, the Web appeared on the scene and our collective focus shifted from isolated islands of personal computers and local networks to a global network of shared information. SGML’s tagged structure would seem to make it a perfect candidate to lead the Internet revolution, but the complexity of SGML makes it difficult to work with and unsuitable for web application design.
Instead of SGML, the developers of the Internet adopted the Hypertext Markup Language (HTML), a simple markup language used to create hypertext documents that are portable from one platform to another. HTML is a simplified subset of SGML. It was originally defined in 1991 by Tim Berners-Lee as a way to organize, view, and transfer scientific documents across different platforms. It uses the Hypertext Transfer Protocol (HTTP) to transfer information over the Internet. This new markup language was an exciting development and soon found nonscientific applications. Eventually, companies and users started to use it as a platform for e-commerce—the processing of business transactions without the exchange of paper-based business documents.
Unfortunately, HTML has some disadvantages. One of the biggest arises as a result of its main purpose. HTML is designed to describe only how information should appear—that is, its format. It was not designed to define the syntax (logical structure) or semantics (meaning) of a document. It could make a document readable to a user, but it required that user to interact with, and interpret, the document. The computer itself could not parse the document because the necessary metadata (literally, data about the data) was not included with the document.
Another problem with HTML is that it is not extensible. It is not possible to create new tags. HTML is also a “standard” that exists in multiple versions—and multiple proprietary implementations. Web developers know that they have to test even their static HTML pages in all of the most popular browsers (and often in several versions of each) because each browser (and each version of each browser) implements this “standard” somewhat differently. Different development tool sets support different versions of this standard (and often different features within a single standard).
In 1996, a group working under the auspices of the World Wide Web Consortium (W3C) created a new standard tagged language called the eXtensible Markup Language (XML). It was designed to address some of the problems of HTML and SGML. XML is a standardized document formatting language (again, a subset of SGML) that enables a publisher to create a single document source that can be viewed, displayed, or printed in a variety of ways. As is the case with HTML, XML is primarily designed for use on the Internet. However, as already mentioned, HTML is designed primarily to address document formatting issues, while XML addresses issues relating to data and object structure. XML is also extensible in that it provides a standard mechanism for any document builder to define new XML tags within any XML document. Its features lower the barriers for creation of integrated, multiplatform, application-to-application protocols.
--------------------------------------------------------------------
Introduction to XML In today’s world, words such as “tag,” “markup,” “element,” “attributes,” and “schema” are buzzwords that you can hear anywhere (well, at least in the IT industry), but what do these terms mean in the context of markup languages?
Introduction to Markup Languages In a broader sense, a markup is anything that you place within a document that provides additional meaning or additional information. For example, this book uses italic font to emphasize each new phrase or concept that is defined or introduced. I have a habit of using a highlighter when I am reading books. Each time I use my highlighter, I change the format of the text as a means of helping me find important segments later.
Markups usually define
- Formatting
- Structure
- Meaning
A reader has to have an implicit set of rules for placing markups in a document—otherwise those markups are meaningless to that reader. A markup language is a set of rules that defines
- What constitutes a markup
- What a markup means
Building Blocks of Markup Languages The syntax of markup languages such as SGML, HTML, and XML is based on tags, elements, and attributes.
A tag is a markup language building block that consists of delimiters (angled brackets) and the text between them:
<TITLE>
An element is a markup language part that consists of a pair of tags and the text between them:
<TITLE>SQL Server 2000 Stored Procedure Programming</TITLE>
Each element has an opening tag and a closing tag. The text between these tags is called the content of the element.
An attribute is a component in the form of a name/value pair that delimits a tag:
<font size="2">
Okay, suppose you have created a document and have marked up some parts of it. Now what? You can share it with others. They will use something called a user agent to review the document. In a broader context, a user agent could be a travel agent that helps a customer buy tickets for a trip. However, in the IT industry, a user agent is a program that understands the markup language and presents information to an end user. An example of such a program is a web browser designed to present documents created using HTML.
XML Elements and Attributes The following is a simple example of an XML document:
<Inventory>
<Asset Inventoryid="5">
<Equipment>Toshiba Portege 7020CT<Equipment>
<EquipmentType>Notebook</EquipmentType>
<LocationId>2<LocationId>
<StatusId>1<;/StatusId>;
<LeaseId>1234</LeaseId>
<LeaseScheduleId>1414</LeaseScheduleId>
<OwnerId>83749271</OwnerId>
<Cost>6295.00</Cost>
<AcquisitionType>Lease</AcquisitionType>
</Asset>
</Inventory>
An XML document must contain one or more elements. One of the elements is not part of any other element and therefore is called the document’s root element. It must be uniquely named. In the preceding example, the root element is named Inventory.
Each element can, in turn, contain one or more elements. In the preceding example, the Inventory element contains one Asset element. The Asset element also contains other elements (Equipment, EquipmentType, and so on). The Equipment element contains just its content—the text string "Toshiba Portege 7020CT."
Unlike HTML, XML is case sensitive. Therefore, Asset, asset, and ASSET would represent different elements.
It is possible to define an empty element. Such elements can be displayed using standard opening and closing tags:
<Inventory></Inventory>
or using special notation:
<Inventory/>
If an element contains attributes but no content, an empty element is an efficient way to write it:
<Asset Inventoryid="5"/>
An element can have more than one attribute. The following example shows an empty element that contains nine attributes:
<Asset Inventoryid="12" EquipmentId="1" LocationId="2" StatusId="1"
LeaseId="1" LeaseScheduleId="1" OwnerId="1" Lease="100.0000"
AcquisitionTypeID="2"/>
You are not allowed to repeat an attribute in the same tag. The following example shows a syntactically incorrect element:
<InventoryInventoryid="12"Inventoryid="13"/>
Next: Processing Instructions >>
More MS SQL Server Articles
More By McGraw-Hill/Osborne
|
This article was excerpted from chapter 13 of SQL Server 2000 Stored Procedure & XML Programming, second edition, written by Dejan Sunderic (McGraw-Hill/Osborne, 2004; ISBN: 0072228962). Check it out at your favorite bookstore today. Buy this book now.
|
|