Extracting Metadata

This chapter looks at the second step of code generation, extracting metadata, how to extract it from a variety of sources, especially focusing on SQL Server metadata extraction. Also introduces the intimate merge of metadata that’s used to incorporate additional manual metadata information. (Code Generation in Microsoft .NET, by Kathleen Dollard, 2004, Apress, ISBN: 1590591372.)

 

dollardExtracting Metadata Principle #2: Collect metadata as a separate, distinct step that can independently evolve.

Extracting metadata means reaching into available sources, dragging out the metadata, and stacking it up neatly so other processes can exploit it. Exploiting the metadata you’ve stacked up will be the next big jump in programming. It’ll look like code generation, but that’s just one way to exploit metadata. Metadata is the key that unlocks the door to faster application development.

It’s not easy to define metadata concisely. Metadata is an Extensible Markup Language (XML) description of your application in terms of databases and/or assemblies, tables and/or objects, columns and/or properties, and so on. The bulk of metadata describes details of the database your application uses, but it can also include details such as user interface information. For instance, if you’re building a bookstore application and your database has a table containing all the books you sell, with columns for their prices and availability, a simplified version of your metadata might look like this:

<dbs:DataStructures xmlns:dbs=”http://kadgen/DatabaseStructure”>
  <dbs:DataStructure Name=”pubs”>
    <dbs:Tables>
      <dbs:Table Name=”Titles”>
        <dbs:TableColumns>
      <dbs:TableColumn Name=”title_id” AllowNulls=”False” Type=”int” />
        <dbs:TableColumn Name=”title” AllowNulls=”False” Type=”int” />
        <dbs:TableColumn Name=”price AllowNulls=”False” Type=”int” />
        <dbs:TableColumn Name=”available” AllowNulls=”False” Type=”int” />
    </dbs:TableColumn>
  </dbs:Table>
</dbs:DataStructure>

Metadata includes everything you use as input for your templates. Chapter 1 included templates you could apply to any data table by supplying the appropriate XML metadata input file. This chapter is about acquiring XML metadata—automatically extracting it where possible. The easiest format for metadata is XML. This XML file defines the application you’re going to build in a way that simplifies code generation. It defines the structure of your application, including its objects, properties, and methods.

NOTE: Metadata won’t define all of your classes because your application includes utility and handcrafted classes that you won’t generate. Furthermore, it won’t define all of your methods because you’ll define additional housekeeping methods and properties within your templates.

 

Metadata fits into your code generation because it’s part of this magic incantation:

< metadata > + < template mechanism > = < generated code >

Metadata comes from one or more sources, often one such as a database that maps directly to your requirements. Metadata extraction takes you from some source to metadata usable for code generation. This process reaches into the source and creates an XML document containing the supplied metadata. Figure 2-1 illustrates the role of this extraction in code generation. You may use more than one metadata source in creating your application. Once you’ve got your metadata, it serves as the input for the templates that generate source code for your application. Throughout this chapter, you’ll see mechanics of metadata extraction. Chapter 6 merges this extracted metadata with an Object Relational Mapping (ORM) view of your database. Chapter 3 shows how to exploit metadata to generate code for your application.

NOTE:  You may have expected that the hard part of code generation would be creating the templates that produce the code. Actually, the hardest part is figuring out your architecture, followed by collecting the metadata. By the time you know what you’re going to build and have the metadata ready, the process of building templates is straightforward. Therefore, this chapter covers the toughest stuff in the book. So be patient if this chapter becomes difficult.

   

dollard 

You can handle metadata in a couple of ways. You can just grab metadata whenever you need it, or you can make metadata extraction a distinct process. Grabbing metadata whenever you need it means simultaneously extracting metadata and outputting code, which makes things more difficult than they need to be. Combining these processes makes it difficult to debug and hard to reuse your extraction mechanism. You lose the metadata/template independence that’s a key benefit of code generation. It’s important to make metadata extraction a distinct step you run and debug it on its own. In other words, instead of going directly to the database when you need to know the columns in a table, retrieve it from an already-created XML file.

Although your database is probably your core metadata source, you also have access to a variety of other metadata sources—anything that has information that could contribute to your application. The problem of extraction is rarely finding the metadata sources; it’s getting the information stacked up neatly ready for your templates. To do that, you have to understand how the metadata source exposes information and what format is easy for code templates to consume. Much of this chapter talks about how selected sources expose data. I’ll focus on XML Schema Definitions (XSDs) and the SQL-92 standard as primary metadata sources. Understanding these sources will help you work with them and clarify how you retrieve metadata from any source. Extracting data from an XSD illustrates how to extract data from any XML source using XML Transformations (XSLT). Extracting metadata from a SQL-92 database illustrates how to use .NET code to extract metadata from a source with a programmatic interface.

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Introducing the Process}

The code generation process needs a single monolithic XML metadata input. But a monolithic metadata file is hard to manage. Different people may perform different metadata extraction and code outputting tasks. Chapter 3 shows how to make code generation a granular process. It makes no sense to create a granular process and then have it rely on a monolithic file that everyone is trying to modify. So, I’ll show you how to merge individual XML metadata into a transient monolithic file in the “Merging Metadata” section of this chapter.

Figure 2-2 is a high-level view of metadata. On the right, the single monolithic XML metadata file is composed of four segments. You might add another segment or two, such as report definitions, but you’ll have a small number of high-level sections in the XML metadata document.

The monolithic XML metadata file is created on individual developers’ machines and is transient. You’ll output it to disk for debugging, but you don’t want to put it under source control. One of the differences between hacking around with code generation and making it a conscious process is intelligently incorporating source control. By keeping only the individual metadata files under source control, you gain important flexibility for your team.

Seeing all these discrete steps might be making you nervous, particularly if you’ve worked with the tedious process of generating strongly typed DataSets in Visual Studio. I’m committed to one-click code generation smoothly integrating these granular processes. You’ll see details of one-click activation of the multistep code generation process in Chapter 3, so you’ll need to trust me on this and postpone testing these tools until then when you’ll have the metadata, the templates, and the generation tool.

Understanding Metadata Sources

Metadata is all of the changeable information you need to build source code. This is the bulk of what changes between your applications. Because the easiest metadata to visualize is from databases, I’ll often talk about metadata in relation to data structures—database fields and columns. Clearly, not all metadata is related to data structures.

 dollard

Figure 2-2. Metadata sources

If metadata is literally any data you need to build your application, you can get it from anywhere. It makes no difference to later processing where you get the metadata. There are at least seven kinds of metadata sources as follows:

  • Schema definitions such as XSD

  • Databases such as SQL Server

  • Manually entered freeform metadata

  • Web Services via Web Service Description Languages (WSDLs)

  • External sources, such as mainframes, Excel, and so on

  • Design tools such as Unified Modeling Language (UML)

  • Existing applications and source code

You can further divide these types into categories based on how you extract the metadata. You extract using .NET if the source has a programmatic interface such as databases, external sources, and existing applications. You retrieve it using XSLT if it’s an XML source such as XSD, WSDL, and XMI output from UML diagrams (see Footnote 1). You can also manually enter metadata by typing in the XML. As you can see in Figure 2-2, regardless of the process, you create a series of individual files.

(Footnote 1. I win Bingo with five acronyms in one sentence! XML is self-describing data whose structure can be defined using an XSD schema. Appendix A discusses XML and XSD. WSDL is basically a specific subset of XML used to describe Web Services. UML is the Unified Modeling Language and is a diagramming standard intended to improve communication during software design processes. XMI is another subset of XML that’s designed for exchanging UML information.)

You can also categorize metadata sources by the high-level element that contains them in the monolithic metadata file. There are only a few broad cate-gories—such as data structures describing data, class structures describing data and functionality, and User Interface (UI) structures describing UI details.

Because there are so many metadata sources, I could write a very long chapter on metadata extraction, but it would be tedious and repetitive. So, I’ll tell you a little about each of these sources and then follow up with a deeper discussion of three metadata sources: XSD schemas, SQL-92 databases (including SQL Server), and freeform metadata. These three sources represent the three key approaches to metadata extraction: XSLT templates, .NET programs, and manual entry. You’ll have to develop your own extraction tool for the other five metadata source types (and any others I omittted), but this chapter will give you a significant head start on writing them.

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Using Schema Definitions Such As XSD}

Schemas explicitly describe the structure of data. Schemas often describe the structure of XML files. Schemas offer a fallback position for finding metadata sources. You can describe almost any data with an XML schema, and you can generally create these schemas via a tool such as XSD.EXE that’s part of Visual Studio .NET. (See Footnote 2.) This gives you a good route to creating metadata when there isn’t a more obvious route, such as the programmatic interface of the SQL-92 standard.

(Footnote 2: You can access XSD.EXE within Visual Studio by adding a DataSet or XML Schema to your project.)

TIP: Use schemas such as XSD as a metadata source when your data source is an XML file or a database that’s supported by ADO.NET but doesn’t support the SQL-92 standard.

Several styles of schemas are available—including three styles for XML. (See Footnote 3.) The most important of these is XSD. XSDs are generally created via an inference tool that finds patterns within an XML document and expresses these patterns and rules in XSD. You’ll rarely need to create an XSD manually, but you need to know a bit about how XSD files are structured to read and understand them.

(Footnote 3: The three major schemas for describing XML data are External Data Representation (XDR), Document Type Definition (DTD), and XML Schema Definition (XSD).)

TIP: Appendix A’s section on XSD gives a detailed breakdown of an XSD file.

Schemas are useful anytime you need information about a data structure. XSDs are important in validating that the structure of an XML file is correct. Visual Studio also uses them for IntelliSense, autocompletion, and ToolTips. I use them for the data-driven XML editor of the code generation harness you’ll see in Chapter 3.

NOTE:  You’ll see XSD schemas used two different ways in this book. XSD can itself be a metadata source. It can also be used to describe other XML files, including the individual metadata files and the scripting XML used in Chapter 3. Appendix A covers XSD basics. Later in this chapter you’ll see more details about working with XSD as a metadata source.

  

Because all ADO.NET data can be expressed as XML, all ADO.NET data can be described by an XSD. This isn’t particularly profound because the XSD specification is sufficiently flexible to describe just about anything. Anything you perceive as data is likely to fit in XML that you can describe with XSD. XSD is itself XML, allowing it to be manipulated with any tool that supports XML.

Directly using XSD as metadata turns out to be really ugly. XSD describes tables and columns in terms of complex types and simple types. The result is that the XPath to access the tables in an XSD looks like this:

<xsl:apply-templates select=   “xs:schema/xs:element/xs:complexType/xs:choice/xs:element”/>

NOTE:  This is XSLT usage, but the same XPath expression (the quoted part of the previous XSLT fragment) would also be used in a SelectNodes expression in brute-force or CodeDOM generation.

I don’t know about you, but I’m more likely to look at that line and go “Huh?” than think, “Okay, looping through tables here.” Translating this to a friendly XML metadata format first allows you to write and maintain code templates that use simpler XPath expressions, which are immediately comprehensible, such as the following:

<xsl:apply-templates select=”Tables/Table”/>

Because the conversion translates the XML of the XSD schema to the XML of the friendly format, XSLT is a good tool for the conversion. You can find an XSDToFriendlyXML.xslt file in the Downloads section of the Apress Web site (http://www.apress.com); it’s discussed in Appendix B. This chapter’s “Understanding XSD’s Roles in Code Generation” section further discusses creating friendly metadata.

TIP Use metadata that’s friendly to code generation with as much preprocessing, such as data type translations, as possible.

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Extracting Metadata from Databases Such As SQL Server}

Databases are a great source of metadata; they’re arguably the most important metadata source you’ll use. But if you can create an XSD for any ADO.NET source, why not just build an XSD and translate it? After all, Visual Studio .NET uses the XSD to describe database structures to build the strongly typed DataSet. Well, the flexibility of XSD is great, but it doesn’t have the richness of a modern database such as SQL Server, particularly if you generate the XSD through the .NET inference engine. It’s hard to include details such as defaults, constraints, and extended properties when you’re working with an inferred schema. You get more information if you programmatically extract it from the database.

TIP Use SQL-92 techniques when your metadata source is a SQL-92 compliant database such as Microsoft SQL Server or Oracle.

NOTE:  To extract data from a database, you have to first include the data in the database. Okay, that may sound pretty obvious, but many people don’t maintain default values, constraints, and sometimes even relations within their database. If you currently avoid storing this information in your database server, reconsider. SQL Server and most other backends are good at managing data for you and allow a good deal of metadata in a single location. But, if you don’t want to store the metadata in your database, you can store it in separate, manually created freeform XML files.

SQL Server offers three different ways to access information on the database structure as follows:

  • You can access the system tables directly. But, that’s really uncool because Microsoft might change them and invalidate your extraction tool.

  • You could use the sp_help procedures that are so handy in Query Analyzer. But they’re designed for people to read and have a lot of quirks and incon sistencies. Besides, they apply only to SQL Server.

  • The best way to extract database structures is the information schema views defined as part of the SQL-92 standard. The “Working with SQL-92 Databases (SQL Server)” section has more details about extracting from these views.

SQL Server, Oracle, and other key databases conform to SQL-92, but others such as Access don’t. If you need to extract data from a non-SQL-92 source, you can fall back on the XSD approach, use tools such as .NET’s GetOLEDBSchemaTable method, or internal queries to create DataSets that contain the system information. 

Entering Metadata Manually

It probably doesn’t surprise you that some of the metadata isn’t in your database or anywhere else outside the brains of your group. This metadata might be tightly associated with database columns or objects, and in that case you can include additional information as extended properties of your SQL Server database. But if there isn’t such an association, you’re working with a database that doesn’t support extended properties, you think extended properties in SQL Server are awkward to use, or your Database Administrator (DBA) resembles an underfed grizzly bear, then the solution is to manually enter the information as freeform metadata.

TIP: Use manually entered freeform metadata when the information isn’t already contained in a translatable source. This includes information extracted from your requirements and information about UI functionality.

A logical use of freeform metadata is adding captions, ToolTips, and validation to the metadata for your database (assuming you aren’t using extended properties to hold this information). Another place you can use freeform metadata is defining UI items. You don’t want to redo things the Visual Studio’s designers do well, but there are a ton of things you do in your user interface that don’t need visual tools. For example, you could create a manual freeform metadata file that defines role-authorized menu structures.

In the “Creating Freeform Metadata” section, you’ll learn more about designing freeform metadata and merging it into your monolithic metadata file.

Extracting Metadata from Web Services via WSDL

Is it possible to write a book today without talking about Web Services? For better or worse, its coverage in this book is limited to this section. That’s because I think you can extend what you learn from other metadata sources and apply it to Web Services. Web Services are basically a universal messaging system on major steroids. The messaging interface of the Web Service is described using WSDL. WSDLs are metadata. That’s their purpose. They describe the service, the data structures that go back and forth, the methods that are called, and the bigger picture stuff about how the service behaves. WSDLs are XML, making them great metadata.

TIP Use WSDL as a metadata source when you’re retrieving data via Web Services and the Web Services aren’t part of the application you’re also generating.

Why would you want to do code generation from WSDLs? If this was a presentation, I could see a number of people waving their hands frantically and the rest with blank stares. There are at least three reasons to do this as follows:

  • Creating client-side proxies

  • Creating Web Services from a standard or industry-wide WSDL

  • Creating code to interact with Web Services

.NET creates client-side proxies for you (either via Add Web Reference or via WSDL.EXE), so you’ll only need to do this yourself if the .NET tool comes up short or you have an unusual reason such as wanting to generate a standard set of client proxies for different platforms for testing.

It isn’t clear how the vision of industry-wide WSDLs will mature. There will probably be industry-wide WSDLs that you’ll support, but most of your work may be extending these standard WSDLs to add value to your products. Implementing these standardized WSDLs will generally involve building entry points into your Web Service with handcrafted code backing them up. In some cases, you might be able to generate your entire Web Service.

If you know the structure of a Web Service, you know the structure of its proxy, and you can write code that interacts with it. Examples of how you might use this are a default Windows application that lets you interact with a Web Service, or doing scatter/gather to a custom user interface. You also might create WSDLs or create Web Service wrappers for middle tier or other data. 

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Retrieving Metadata from External Sources}

The sources you’ll retrieve metadata from may be quite varied and include sources such as mainframes and Excel. Think broadly about metadata. What are you working with? What metadata information do you need, and how can you get it? Are you working with laboratory information, process control, or home automation with specific types of sensors and controls you can describe with metadata? Do you need to mimic the structure of mainframe data, either for synchronization or for an initial import of your data? Can you grab the structure and build CREATE stored procedures as well as import and synchronization processes? How can you capture this information?

TIP Keep your mind open to incorporating other metadata sources as you run across them. You can use any predictably formatted information as a metadata source.

Remember that metadata doesn’t include information that might change at runtime because you won’t regenerate to incorporate changes at runtime. (See Footnote 4.) In general, metadata will not include specific network layout information, client configuration information, and so on. If these are part of your application, your metadata might include the structure that holds such details, but not the details themselves. This distinction may be easier to understand if you consider a simple home automation example. Information about individual types of sensors and controls are valid metadata. Each type of sensor or control has a logical set of data, such as a thermometer having a property that can contain its location and another property that can contain the current temperature. The temperature isn’t metadata, and in most cases the specific location isn’t metadata. The fact that a thermometer is a type of sensor with a current temperature and location is metadata.

(Footnote 4. Strictly speaking, you can, but building and compiling source code at runtime is rarely justified.)

Once you understand what metadata you’ve got and where it currently resides, you can design a freeform XML block. Based on that schema, you can write an XSLT to organize the metadata if it’s already in an XML format or write a program to extract it. How hard that program might be depends on how messy the source’s interface is.

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Extracting Metadata from Design Tools Such As UML}

I really wanted to be able to give you good tools for building metadata directly from UML. After much time and grief, I commiserated with one of my UML guru friends (who is a bit gnarly around the edges), and he said, “What? I’d fire anyone who took their UML to so much detail that you could directly build business objects from them. UML isn’t an eye chart!” That’s not an excuse. Okay, that’s an excuse, but it’s also a different perspective. I’ll admit the real reason I’m not giving you a good UML solution is that it’s a complex metadata source to work with, and databases are often a more appropriate metadata source.

TIP: UML should make an excellent metadata source. Unfortunately, technical hurdles make it a difficult metadata source to use.

Using Visio

Visio provides a UML modeling tool, but it isn’t very good. (See Footnote 5) One of its many problems is that it quite successfully buries the UML metadata where it’s extremely hard to access. There are a couple of contorted approaches you can use to extract metadata from Visio UML. You can try to break up the XML save format. But that’s a storage format for a drawing program and has no understanding of UML. You can create XML Metadata Interchange (XMI) output if you download a special Dynamic Link Library (DLL) from MSDN (search the MSDN library for XMI and UML) (See Footnote 6). The XMI standard is an XML layout (defined in a schema) that allows for interchange of information between UML tools. To work with this file, you’ll also have to download a Document Type Definition (DTD) schema from the Object Management Group (OMG) Web site, which the MSDN article explains. A simple UML diagram with two classes, three attributes, and three operations was a 313-kilobyte (KB) XMI file. So, for goodness sake, translate it into friendly XML and don’t do code generation directly against this file!

(Footnote 5: Visio currently stinks for UML. I remain optimistic that Microsoft will provide a good UML product.)

(Footnote 6: You can access MSDN libraries at http://www.msdn.microsoft.com even if you aren’t an MSDN subscriber.)

Visio also offers a rudimentary code generation feature. From the perspective of real code generation, it’s useless. It’s buggy, extremely tedious to use, and from a practical perspective just builds shells. However, there’s a third approach to extracting UML metadata from Visio lurking here. You can create your UML, generate a dummy C# project, then use reflection to extract the UML structure from these classes, and finally massage problems that occur. Wow, that’s so ugly, forget I even suggested it.

Using Other UML Tools

If you’re using Rational or another UML tool, you can probably create XSD schemas for the classes you define. The tool may not carry all of the information from your UML design, but it’s a start. These products generally also provide XMI support, allowing you to extract a more complete structure—if you’re willing to attack XMI’s complexities. You also may be able to export the structure to the SQL Server repository and extract it from there—again, a rather roundabout way to accomplish a simple task.

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Extracting Metadata from Existing Applications and Source Code}

An existing application can be a metadata source. Wow, that’s a weird concept. But it can be a metadata source as source code, a Component Object Model (COM) type library, or via reflection from a .NET application. It turns out that source code is an extremely difficult metadata source to use, and I actually suggest you avoid it unless there’s compelling information to extract from it. Reflection, on the other hand, is an easy metadata source to use.

TIP: Using existing applications as a metadata source when interfacing with them makes up a significant aspect of your application.

Using Reflection

Reflection is a set of tools in .NET that allows you to query the assembly header (often called metadata, but I think that’s confusing in this context) and retrieve information about the types within the assembly. Type is the .NET word for classes, enums, structures, and so on. In addition to basic type information, the header describes the methods, properties, constants, fields, and nested types within the class or structure. (See Footnote 7.) Retrieving this information allows you to build a metadata file describing the type, which can be useful in generating code that accesses the types.

(Footnote 7. Fields are variables declared at the top of your class outside the scope of any method or property.)

For example, knowing the controls on a form allows you to automate binding or scatter/gather processing, particularly if you use careful naming for the controls. Careful naming is a specific technique discussed in the “Introducing Careful Naming” section. Careful naming allows names themselves to carry usable information. Attributes can also help you sort out what each method and property actually does. You can use reflection to generate an XML representation of any .NET EXE or DLL, whether you have the source code. This allows you to automate the interface to a component even if you don’t have access to its source code.

TIP: To use reflection, you’ll have to compile your application. Approaches that result in half-built classes you’re planning to finish out with code generation often don’t compile and thus aren’t usable with reflection.

Using TypeLibs

You may want to extract metadata from an existing COM application that was written in Visual Basic 6, C++, or another language. Extracting the defined interface into metadata allows you to generate code that interacts with these applications. You could also use code generation to create wrapper objects on either the COM or .NET side to reduce the chattiness of interop conversations.

NOTE:  Chapter 4 explains why it’s hard to protect handcrafted code in Visual Basic (VB) 6 and thus why it’s a poor language for appli-cation-wide code generation. However, if you want a static wrapper class without handcrafted code, generating VB 6 code works just fine.

You can use TlbInf32.dll (which you can download from Microsoft) to provide information about most COM type libraries that’s similar to the information provided by .NET’s reflection features. If you search MSDN for this file, you’ll find how to download it, as well as several articles about using it.

Using Source Code

It’d be great to get metadata directly from your source code. This would allow you to extract comments, track internal conditions, work with the internals of files in other languages such as Visual Basic 6, and incorporate regions. Unfortunately, it’s a really hard job. The fundamental reason it’s hard is that source code is poorly structured. The job of taking poorly structured information and placing it into consistent, highly structured XML considering all possible variations makes parsing very complex. To do it right, you have to incorporate at least variable file openings, nested types, complex class and method headers (XML comments), multiple namespaces and classes, and nested regions; and you’ll want to handle attributes at every level. Whew!

NOTE:  One reason you might want source code parsing is to capture header comments in VB .NET, similar to C# XML comments. Because XML comments are planned for the Whidbey version of Visual Basic .NET, you might just want to be patient. (See Footnote 8.) If you want XML comments in VB today in version 1.0 or 1.1, search Google for VB .NET XML comments. At the time of this writing, commercial products from at least two vendors (Fesersoft’s VB.NET XML Comments Creator and VBXC—VB.NET XML), an entry on GotDotNet (http://www.gotdotnet.com), and an open-source project at SourceForge (http://www.sourceforge.net) are available.

(Footnote 8. Whidbey is the code name for the next version of the Visual Studio family after the Visual Studio 2003 version.)

If you have other reasons you want to parse .NET source code into meta-data, you can try the automation model of Visual Studio. The EnvDTE namespace contains the automation model and parses out the source code structure, which is the hard part. It isn’t going to hand you the comments and it requires the file be open in the editor, but it may make some jobs easier. 

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Why Extract Metadata?}

Metadata is a key that unlocks many doors. Not only does it play a critical role in CodeDOM, brute-force, and XSLT code generation, but it also plays a similar critical role in many dynamic data/reflection-driven architectures. Even if you code things the old way, line by line, metadata plays a role although you probably don’t think in those terms. When you look at things such as data diagrams and hold that information in your head until you manually incorporate it into your application, your brain extracts metadata. Although it doesn’t stack the metadata up neatly the way XML can, it performs the same kind of translation with metadata you’ll ask templates to perform.

What you create with metadata can be as varied as the metadata sources. You can create any .NET source code, HTML, XML, RTF, stored procedure, or any other text-based file. (See Footnote 9.) This means you can use generation from metadata to perform tasks as varied as creating your database and documentation in addition to producing code. This book focuses on creating .NET source code and stored procedures to select and update data because that’s the common ground where we all work. Many of you will take generation further in a direction specific to your application or environment.

(Footnote 9. Hypertext Markup Language (HTML) is for Web page development. Extensible Markup Language (XML) encapsulates data with metadata defining the contents. You use Rich Text Format (RTF) in a generic word processing format.)

Metadata is also a key step in your development strategy. It’s a bridge between your templates and your data. It’s information that’s ultimately derived from requirements ready to marry templates expressing your architecture. Getting your metadata ready for template development is a significant milestone. Later template development and metadata extraction will be like hand and glove. The template determines what metadata you need; the metadata defines what the template can do. Where you get the data may vary and the contents of the templates will change, but the process of extracting metadata and applying templates remains the same regardless of what you’re building.

Designing Metadata

What does the target metadata look like? It looks like XML, of course, but that doesn’t tell you anything because you can organize XML just about any way you want. The goal is to have the metadata friendly to code generation. This means the following:

  • Pushing all possible processing into metadata extraction to maximize reuse

  • Having a predictable location for any piece of information independent of how it’s extracted

  • Simplifying access even when it means providing redundancy

  • Not allowing metadata to become language specific

You want to push as much processing as possible into the extraction step because it supplies an automatic level of reuse. For example, you can do type conversions, such as SQL type names to .NET type names, as part of your meta-data extraction. Each time you extract a particular column type, you may use it ten or more times in templates. Because you’re going to automate things, repeating stuff is okay.

NOTE:  Within the XML metadata file, you need a single, predictable location for specific types of data so you can find the data you need when you’re doing code generation. Few of us have organized desks. Most of us can find what we’re looking for without much difficulty despite the chaos around us. But try to explain to a coworker where a particular document is when you’re stuck at home with the flu. Metadata is more like the office supply cabinet than your personal desk. You want to get what you need, and get back to work. So does everyone else working on your project now and in the future. It’s not your desk. The organization schemas I use are a good starting point. Evolve them intelligently.

Simplifying access sounds like a noble goal, but how do you accomplish it? Imagine doing code generation, and imagine what you might need at a given point. For example, consider writing a class that represents a database table. You need the table name, the column information, the primary key information, child relations, and parent relations. For each relation, you need the specific parent and child keys of that relation. In some organization schemes, including XSD, this is scattered all over the place, which is bad. Use the extra bits and repeat parent and child relations within data elements. This makes it easy to pick up a single node and work with it.

In creating some aspects of metadata, you’ll have to choose between using a .NET Framework or a language-specific element. This will happen with data types and check constraints. For data types, you can use int, System.Int32, or Integer. They all mean the same thing, but only Int32 works with any .NET code. Even if you don’t anticipate language changes, you hope your metadata will have a long life, and who knows how languages will evolve. If you really can’t imagine having code that looks like this:

Imports System

Dim j as Int32

or this:

using System

Int32 j;

then include both the language-specific and framework name in your metadata.  

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Establishing Your Own XML Design Guidelines}

How do you design XML? The same data can be expressed in XML in a multitude of ways—how do you settle on one? It helps to articulate a set of rules you can use to ensure consistency. My rules are as follows:

  • Use attributes for data that can logically occur just once and is a reasonable length.

  • Pay careful attention to elements’ names—ensuring they describe the breadth of what they can contain now and in the future.

  • Use child elements for items that can possibly occur multiple times.

  • Separate child elements by category and generally use grouping elements such as Tables.

  • Use plurals only in the grouping elements, and use correct grammar and spelling for plurals.

  • Unless it’s unavailable or illogical, all elements have an attribute named Name that contains the logical name for the data item. This name is unique at least within the parent element.

  • Use Pascal casing for element and attribute names. (See Footnote 10.) 

NOTE This is a potential trouble spot, particularly for VB developers because XML tools are case sensitive.

  • Minimize abbreviations in element and attribute names.

  • Use “true” and “false” for Boolean values.

  • In general, preserve raw data within the XML document when massaging is done. For example, when changing a database type to a .NET type, preserve the original database type name, as well. You can break this rule when you specifically remove information to simplify the XML document, such as you’ll see in Chapter 6.

  • Use a consistent style for namespaces (urn, uri, and so on).

  • Expect to infer XSD schemas, so try to incorporate its rules, particularly not repeating element names in different contexts, and test XSD inference early and often.

  • Use prefixes for all namespaces—avoid using empty namespaces or default namespaces.

(Footnote 10. Pascal case has the first character capitalized as well as the start of any new logical word. An example of Pascal casing is DataTable. This is sometimes confused with camel casing in which the first letter isn’t capitalized. For example, dataTable is camel casing. It helps me remember the difference that a camel has humps only in the middle. Pascal casing isn’t standard for XML and XSLT files, but I think it makes the files more readable.)

NOTE:  There’s nothing magic about this set of rules. The magic is having any specific set of rules that you can turn to when you’re wondering “Should this be an attribute or element?” or “How should I name this element?”. I hope your organization can agree on a set of rules and allow it to evolve. I’ll warn you that I’m imperfect and slip up sometimes even on my own rules.

Using these rules, as well as rules for legal XML, the design of the metadata begins to emerge. You need a root element and want its name to imply the full breadth of data that might be contained in the file. It’s helpful to include the word Root, such as MetadataRoot. The root contains grouping elements. For example, DataStructures contains the information about individual data schemas. You might think about naming the data schema root Databases, but that’s too restrictive because the data may be contained in many forms such as XML, config files, transfer from external programs, and so on. The first cut at the monolithic metadata file is as follows:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<MetadataRoot >
  <dbs:DataStructures FreeForm=”true” xmlns:dbs=”http://kadgen/DatabaseStructure”>
    <dbs:Tables<>/dbs:Tables>
    <dbs:UserDefinedTypes<>/dbs:UserDefinedTypes>
    <dbs:StoredProcedures<>/dbs:StoredProcedures>
    <dbs:Functions<>/dbs:Functions> 
    <dbs:Views<>/dbs:Views> 
  </dbs:DataStructures>
  <ffu:UIStucture xmlns:ffu=”http://kadgen/FreeFormForUI.xsd” />
  <OtherStructures></OtherStructures>
  <ClassStructures></ClassStructures>
</MetadataRoot>

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Introducing the Tools for Metadata Extraction}

Three tools for metadata extraction are available at the Apress Web site in the Downloads area as follows:

  • XSDToFriendlyXML.xslt: An XSLT stylesheet to convert XSD schemas to XML that’s friendly to code generation. This effectively extracts usable metadata from an XSD.

  • Database Extraction: A .NET tool to coerce a rich set of metadata from SQL Server. You can modify this tool to extract metadata from any SQL-92 compliant database, such as Oracle.
  • XML Merge: A .NET tool that performs an intimate merge to incorporate freeform, external, and reflection-derived (and so on) metadata into a monolithic metadata file.

It’s a bit awkward for you to run these tools until you get the code generation harness in the next chapter. You can use a console application if you want to experiment with them now, or you can wait for the generation harness.

NOTE:  Appendix B offers walkthroughs of key parts of the code for the first two tools, and I explain the third later in this chapter. You can use the appendix for a deeper understanding either because you’re most comfortable driving when you know what’s under the hood or because you need to tune the engine—making a change to the tools so they work better in your environment.

Isolating portions of your application and development processes that are likely to remain static from those where change is likely improves maintainability. Many tasks of metadata, such as finding parent or child keys or translating a data type, will be done many times within your templates, but the algorithms for creating them will remain unchanged until there are fundamental changes in how databases work. By pushing this work into your metadata extraction step, you can do it once and provide the results wherever it’s needed.

TIP: Do as much work as you can upfront for creating metadata friendly to your templates. This maximizes reuse and improves maintainability. Extra time developing your metadata extraction tool makes sense because you’ll use it not only as this project evolves, but also when you move forward to new applications. Most of the issues you’ll encounter are solved in the metadata extraction tool you can download from the Apress Web site.

You can attack metadata extraction at a variety of levels. Imagine a bell curve of all the available metadata with the information you’re most likely to use in the middle. For this book, I stretched out beyond the central section and worked to make this tool applicable to most scenarios. But, I’ve certainly left a lot of blank space at the edges and avoided trying to solve all problems for all people. I wanted to keep these tools and my explanation of them from being excessively complex.

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Understanding XSD’s Role in Code Generation}

XSD plays several roles in code generation. It’s the base schema description language used in XML editing, it’s used by the code generation harness, and it’s a metadata source. XSD is a poor metadata source because it doesn’t catch the richness of other sources such as SQL-92 databases and because it’s awkward to use. To solve the second problem, you can use a two-phase extraction—first building the XSD and then translating it into a friendly format. When you don’t have a rich underlying source, XSD provides effective metadata source.

What’s the difference between the structure of an XSD and the friendly XML I’m suggesting for code generation? The basic differences are that the friendly metadata contains familiar terms such as Table and Column in the element names and contains all information about a table within its hierarchy (descendant elements and attributes). You can compare these formats as you see both XSDs and the friendly XML target (which is the same target as SQL-92 databases) in the next few sections. XSD also provides only a subset of that metadata provided by the databases. The XSDToFriendlyXML.xslt template available on the Web site translates standard XSDs to this friendly XML format. Appendix B discusses this XSLT template.

You usually retrieve the XSD by a process called inference. Inference guesses at schema rules based on a sample XML file. XSD is an important but difficult to grasp technology; there’s more about XSD in Appendix A.

Inferring XSD Schemas

XSD is itself XML, making it usable with a variety of tools, including XML editors and XSLT. That’s cool, but XSD schemas are nonintuitive, ugly, and really hard to build. That’s the bad news. The good news is that you may never have to create one from scratch because there are a variety of tools to infer the schema.

The most accessible inference to you is probably the one supplied with .NET—one of several operations performed by XSD.EXE. (See Footnote 11.) Table 2-1 shows the different input options for XSD.EXE when you run it from the command line. If you have Visual Studio .NET installed, the easiest way to find XSD.EXE is to open a Visual Studio command prompt. This sets the path to include [Program Files]Microsoft Visual Studio .NET 2003SDKv1.1Bin, which contains XSD.EXE.

(Footnote 11. There’s another inference engine downloadable from Microsoft (http://www.gotdotnet.com/team/ xmltools). If you don’t find it, try searching for XSD Inference on the gotdotnet site. This engine can build XSDs from XML files that choke XSD.EXE. Another benefit of this alternate inference engine is that you can call it multiple times with different XML documents to evolve a schema. I have used XSD.EXE in this book so you don’t have to download an extra tool.)

INPUT

OUTPUT

IMPORTANT SWITCHES

SWITCH DESCRIPTION

.xdr

XSD schema

/o:<directoryName>

Specifies output directory

.xml

XSD schema

/o:<directoryName>

Specifies output directory

.xsd

Strongly typed DataSet or class source code

/o:<directoryName>

Specifies output directory

 

 

/d

Produces strongly typed DataSet

/c

Produces strongly typed class

/l:<languageName>

Specifies language as CS (C#), VB (Visual Basic .NET), JS ( JSharp)

 

/e:<elementName>

Specifies specific elements to generate (defaults to all)

 

.dll or .exe

XSD schema for specified types 

/o:<directoryName>

Specifies output directory

 

/t:<typeName>

Specifies the specific types to generate

The XSD.EXE application’s behavior when you pass a file with the extension .xdr, .xml, .dll, or .exe is important in the context of metadata. With all of these file types, the output is an XSD file. (See Footnote 12.) Because you can also express DataSets as XML, you can infer an XSD schema for anything you can stuff into a DataSet. As a result, you can use XSD.EXE to create a schema for most things that you would classify as data. Visual Studio .NET uses the same inference engine to create XSDs if you right-click an XML document and select Create Schema.

(Footnote 12. A separate key role of XSD.EXE is creating the ADO.NET strongly typed DataSet.)

NOTE:  You can also directly create XSDs with Visual Studio .NET. The XSD design surface is displayed when you add a DataSet or XSD to your project. These use the same design surface because a DataSet is defined within Visual Studio .NET as an XSD. Although you may only be familiar with the graphical design tool of the XSD/DataSet surface, you can see the XSD itself if you select the XML tab. You can also programmatically infer a schema for an ADO.NET DataSet using the GetXMLSchema method.

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Exploring the Structure of an XSD}

An XSD schema defines elements. The elements may contain other elements. To show how these elements work to describe the data structure, I’ll walk through the XSD schema created by XSD.EXE from the Customers and Orders table of the Northwind database. The root element represents the DataSet and child elements representing the Customers and Orders tables of Northwind.

The opening of the XSD shows that it’s just an XML file with a specific set of namespaces. It contains the XML header, a single root element, and a series of defined namespaces. In this case, the name of the DataSet is “DataSet1”:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<xs:schema id=”DataSet1″
          targetNamespace=”http://tempuri.org/DataSe1t.xsd”
          elementFormDefault=”qualified” attributeFormDefault=”qualified” 
          xmlns=”http://tempuri.org/DataSe1t.xsd”
          xmlns:mstns=”http://tempuri.org/DataSet1.xsd”   
          xmlns:xs=”http://www.w3.org/2001/XMLSchema”  
          xmlns:msdata=”urn:schemas-microsoft-com:xml msdata”>

NOTE:  Namespaces are one of the most difficult things to work with correctly in XML and .NET processing of XML. Appendix A covers some of the nuances of namespaces.

The xmlns:xs=”http://www.w3.org/2001/XMLSchema” defines this file as an XSD. Among the other things this namespace provides, you’ll find that if you edit the XSD within Visual Studio .NET, IntelliSense works because Visual Studio maps this namespace to a schema file it can use for IntelliSense.

Complex types contain child elements. Elements representing the DataSet and DataTables are complex types:

<xs:element name=”DataSet1″ msdata:IsDataSet=”true”>
  <xs:complexType>
    <xs:choice maxOccurs=”unbounded”>
      <xs:element name=”Customers”>
        <xs:complexType>
          <xs:sequence>

The table elements contain child elements representing columns. Columns are simple types that have attributes describing their name, type, and other information. Simple types don’t contain child elements:

<xs:element name=”CustomerID” type=”xs:string” />
<xs:element name=”CompanyName” type=”xs:string” />
<xs:element name=”ContactName” type=”xs:string”

When you combine several elements in a complex type, such as tables within a DataSet or columns within a table, you can restrict them to a specific order or let them appear in any order. If the order of a set of elements must match a defined order, then it’s an xs:sequence . Only one element of a set of child elements defined using xs:choice can normally appear in a document. However, by adding the maxOccurs=”unbounded” to the element, it may occur as many times as needed. Thus, tables within a DataSet can appear in any order, and can appear only when needed. The tables of a DataSet are defined using xs:choice because they can occur in any order. The columns within a table are defined using xs:sequence because if a column occurs, it must be in a specific sequence:

        </xs:sequence> 
      </xs:complexType>
    </xs:element>
    <xs:element name=”Orders”>
      <xs:complexType> 
        <xs:sequence>
          <xs:element name=”OrderID” msdata:ReadOnly=”true”
                    msdata:AutoIncrement=”true” type=”xs:int” /> 
          <xs:element name=”CustomerID” type=”xs:string”
                    minOccurs=”0″ />
          <xs:element name=”EmployeeID” type=”xs:int”
                    minOccurs=”0″ />
          <xs:element name=”OrderDate” type=”xs:dateTime” 
                    minOccurs=”0″ />
        </xs:sequence>
      </xs:complexType>
    </xs:element>
  </xs:choice>
</xs:complexType>

Nullable columns have the minOccurs=”0″ attribute. This specifically means that having no occurrences of the element is legal. XML handles nulls by leaving out the entry. This is a little surprising if you’re expecting all of the elements within a logical XML record to appear. It can also cause issues during inference.

This XSD contains three constraints. Two are primary key constraints, one on each table in the DataSet. These primary key constraints just define the table and the primary key columns:

<xs:unique name=”DataSet1Key1″ msdata:PrimaryKey=”true”>
    <xs:selector xpath=”.//mstns:Customers” />
    <xs:field xpath=”mstns:CustomerID” />
</xs:unique>
<xs:unique name=”DataSet1Key2″ msdata:PrimaryKey=”true”>
    <xs:selector xpath=”.//mstns:Orders” />
    <xs:field xpath=”mstns:OrderID” />
</xs:unique>

A referential constraint defines the relation between two tables. XSD describes this relationship by referencing the existing primary key constraint for the parent table, such as DataSet1Key1. This is the constraint for the parent key. The child table of the relation is defined by its table and column, both stated in terms of an XPath statement:

    <xs:keyref name=”CustomersOrders” refer=” DataSet1Key1″>  
      <xs:selector xpath=”.//mstns:Orders” />
      <xs:field xpath=”mstns:CustomerID” />
    </xs:keyref>
  </xs:element>
</xs:schema>

Although there are some additional details, that’s the core of understanding XSDs. XSD is an ugly, messy format that’s both verbose and effective. It’s designed for maximum flexibility across a lot of different kind of data. As the complexity of the data structure increases, obviously the complexity of the XSD increases, and it becomes even more difficult to read.

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Working with SQL-92 Databases (SQL Server)}

Databases might be the only metadata source you ever use. Everything else may be optional. Most applications are data centric, and most modern relational databases hold a phenomenal amount of information in their schemas. Many of the backends, including SQL Server and Oracle, provide rich information using features corresponding to the SQL-92 standard.

Selecting a Standard

SQL-92 is the most mature in a long series of standards that have been used to define relational databases. It’s supported by SQL Server, Oracle, and several other databases.

Most databases go beyond the standards to provide their own extensions. Auto-incremented columns aren’t SQL-92 compliant—they’re extensions in any backend that supports them. Starting with SQL Server 2000, Microsoft provides extended properties that allow you to put name-value pairs on any object including tables, columns, stored procedures, stored procedure parameters, and so on. This allows you to extend metadata within the database. You can use this to add captions, descriptions for ToolTips, lookup information, and anything else you can imagine. You can add extended properties via Transact SQL (TSQL) statements, by right-clicking the object in Query Analyzer, or by creating your own tool to edit them. Extended properties aren’t part of the SQL-92 specification.

NOTE:  There’s another standard you might be interested in if you’re doing data warehousing. If you know you need information from the Open Information Model (OIM) standard, then you can write code similar to the SQL-92 extraction tool to extract that metadata. If you don’t know about the OIM standard, the chances are excellent that the simpler approach of SQL-92 with database-specific extensions will provide all the metadata you need.

Working with the Metadata Extraction Tool

The metadata extraction tool that you can download from the Web site extracts a ton of information about your database. This section discusses a little theory behind the tool and what it accomplishes. Appendix B walks through important parts of this tool.

TIP The metadata extraction tool at the Web site accesses only one database at a time. To work with multiple databases, you can extract data separately and combine it using the merge techniques discussed later in this chapter.

Understanding the Target Structure

Before looking at how the extraction tool works, it’ll help to know what the target XML file looks like. This is the target for both SQL-92 extraction and the XSDToFriendlyXML transformation. The opening of Northwind reveals familiar element names such as DataStructure and Table, rather than the obtuse element names of XSD. Column elements are called TableColumn because there are both table and views that contain columns, and that needs to be differentiated:

<?xml version=”1.0″ encoding=”UTF-8″?>
<dbs:DataStructures xmlns:dbs=”http://kadgen/DatabaseStructure”>
  <dbs:DataStructure Name=”Northwind”>
    <dbs:Tables>
      <dbs:Table Name=”Orders”
                      OriginalName=”Orders”
                      SingularName=”Order”
                      PluralName=”Orders”>
         <dbs:TableColumns>
            <dbs:TableColumn Name=”OrderID” 
                                   OriginalName=”OrderID” 
                                   Ordinal=”1″ Default=”” 
                                   AllowNulls=”False” 
                                   SQLType=”int” 
                                   NETType=”System.Int32″
                                   MaxLength=””
                                     IsAutoIncrement=”true”>
<additional tables clipped for brevity>

To help sort out naming, the extraction process provides four names for each table. Table 2-2 explains the source and use of each name. The Name and OriginalName attributes are the same, unless there are spaces in the original name. For reasons that seem mysterious outside Microsoft, names of database elements can contain spaces. I recommend you avoid names containing spaces because they can cause problems all over the place. Metadata extraction replaces any spaces in the original name with underscores to create the Name attribute. Many table names based on English language words can be automatically converted to singular and plural forms. It makes no difference to this process whether the original table names are singular or plural, but it’s rather maddening for humans to work with databases intermixing singulars and plural table names, and the ORM process in Chapter 6 is a little easier if you use singular table names.

CAUTION — The logic used for making singular and plural forms of tables names won’t work for all table names and won’t work with names not based on the English language.

ATTRIBUTE

DESCRIPTION

USAGE

PROVIDED FOR COLUMNS?

PROVIDED FOR TABLES?

Name

The name of the element as used in the database, with spaces replaced with underscores and keywords prefaced with an underscore. Plurality isn’t changed.

The primary identifier for the element.

Yes

Yes

OriginalName

The name of the element used in the database.

Provided for reference and used when generating stored procedures.

Yes

Yes

Singular

The Name attribute modified to make it singular if it isn’t already singular,

Name of row classes; prefix for collection classes that end with Collection.

Yes

No

Plural

The Name attribute modified to make it plural if it isn’t already plural.

Name of variables pointing to collection classes.

Yes

No

Table 2-2. The Use of Each of the Name Attributes Provided for Tables and Columns

NOTE:  This and other aspects of the samples and example architectures throughout this book assume you don’t have both singular and plural forms as separate database tables (for example, Customer and Customers tables in the same database). If you do, you’ll have to adjust. You can either modify the extraction tool or supply modified singular and plural names via freeform meta-data (discussed later in this chapter). .

The TableColumn element contains information retrieved from SQL Server. The NETType attribute contains a string translated from the SQL type.

Each column has privileges defined in the database. If you include these privileges in your metadata, you can fine-tune your user interface behavior via your templates. Each column may also have check constraints, but the specific column schema shown here doesn’t have a check constraint:

<dbs:TableColumnPrivileges>
   <dbs:TableColumnPrivilege Grantor=”dbo” Grantee=”public”
                  Type=”REFERENCES” />
   <dbs:TableColumnPrivilege Grantor=”dbo” Grantee=”public”
                  Type=”SELECT” /> 
   <dbs:TableColumnPrivilege Grantor=”dbo” Grantee=”public”
                  Type=”UPDATE” /> 
  </dbs:TableColumnPrivileges> 
  <dbs:CheckConstraints /> 
 </dbs:TableColumn>
</dbs: TableColumns >

Like columns, tables have privileges defined in the database. This information is included in the XML as follows:

<dbs:TablePrivileges>
  <dbs:TablePrivilege Grantor=”dbo” Grantee=”public” Type=”REFERENCES” />
  <dbs:TablePrivilege Grantor=”dbo” Grantee=”public” Type=”SELECT” />
  <dbs:TablePrivilege Grantor=”dbo” Grantee=”public” Type=”INSERT” />
  <dbs:TablePrivilege Grantor=”dbo” Grantee=”public” Type=”DELETE” />
  <dbs:TablePrivilege Grantor=”dbo” Grantee=”public” Type=”UPDATE” />
</dbs:TablePrivileges>

One of the most important things in the XML metadata is information about table constraints. This includes primary keys and table relations. The primary key information contains a list of primary keys. Generally each table has a single primary key, but it may have a compound key made up of several columns so it’s a child element, rather than an attribute:

<dbs:TableConstraints>
  <dbs:PrimaryKey>
    <dbs:PKField Name=”OrderID” Ordinal=”1″ />
  </dbs:PrimaryKey>

Table elements in the metadata file contain both parent and child relations. This means that each relation is described both from the perspective of the parent and from the child. This allows templates using the metadata to access relation information easily and helps avoid convoluted XPath statements. The Orders table has one child and three parent tables, each of which uses one key field. The child table element doesn’t need to list the parent key fields because it’s immediately available as the primary key of the current table. Multiple key fields would result in multiple ParentKeyField and ChildField elements:

    <dbs:TableRelations>
      <dbs:ChildTable Name=”Order_Details”>
        <dbs:ChildKeyFields Name=”OrderID” Ordinal=”1″ />
      </dbs:ChildTable> 
      <dbs:ParentRelations> 
        <dbs:ParentRelation ParentTable=”Customers”>
          <dbs:ParentKeyField Name=”CustomerID” Ordinal=”1″ /> 
          <dbs:ChildField Name=”CustomerID” Ordinal=”1″ /> 
        </dbs:ParentRelation>
        <dbs:ParentRelation ParentTable=”Employees”>
          <dbs:ParentKeyField Name=”EmployeeID” Ordinal=”1″ />
          <dbs:ChildField Name=”EmployeeID” Ordinal=”1″ />
        </dbs:ParentRelation> 
        <dbs:ParentRelation ParentTable=”Shippers”>
          <dbs:ParentKeyField Name=”ShipperID” Ordinal=”1″ />
          <dbs:ChildField Name=”ShipVia” Ordinal=”1″ />
        </dbs:ParentRelation>
      </dbs:ParentRelations>
    </dbs:TableRelations>
  </dbs:TableConstraints>
  <dbs:ExtendedProperties />
</dbs:Table>

The remainder of the table elements repeat this sequence. The XML document contains similar sections for views, stored procedures, and so on. By the time the metadata encompasses all of the information in the database, the metadata files can become fairly large. The metadata file for Northwind, which isn’t a particularly complex database, is about 200KB. 

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Understanding the Tool Architecture}

The SQL-92 standard features provide most of the metadata you’ll want and work with a number of database backends. Each backend provides additional metadata opportunities. That makes a base/derived class pattern a good choice for the metadata extraction tool. A base class named ExtractMetadata gives access to the SQL-92 standard features. The SQLExtractMetadata class derives from this and provides Microsoft SQL Server–specific features. Figure 2-3 illustrates this relationship. The Oracle box is gray because I haven’t supplied an Oracle class.

dollard

The derived class provides the housekeeping duties of accessing the back-end as well as backend provider-specific extensions. Housekeeping duties include connection management and accessing the data through a DataAdapter. Because these tasks are handled by the derived classes, the base class can’t stand on its own so is defined as an abstract (MustInherit) class. You’ll need to build a derived class to at least perform this housekeeping functionality for any other backend database you use. You may want to add other features specific to your backend, similar to the way extended properties are added in the SQL Server derived class.

Because I don’t know what you’re going to do with derived classes you may create, I maximize the flexibility by making everything in the base class Overridable (virtual). I don’t suggest you do this to most classes. In general, you want Overridable to indicate the ways you anticipate derived classes using your base class. In this case, I cheat and make everything overridable.

NOTE:  You do incur a small performance penalty every time you access an Overridable member. This performance hit is inconsequential when you need to allow the member to be overridden. However, doing this for members that you don’t expect to be overridden just wastes processing cycles.

TIP: It doesn’t matter if you waste processing cycles during code generation.

Understanding Security

When you’re creating metadata, you need permission to read from the system tables. The easiest way to do this is to create one or more logins for code generation that are different from the logins you’ll use at runtime. From the programmer’s perspective, the simplest approach is Windows authentication. The downside of this approach is that programmers may wind up testing the application with different rights than end users will have. One solution is to have multiple logins and use the Windows feature Run As when running code generation. Another solution is to provide a login to the code generation and use username/password authentication.

The tools on the Web site use Windows authentication because that’s the most appropriate approach for code I’m asking you to run against your server. You can change that in the connection string in the SQLExtractMetadata class. This connection string is in the class, rather than App.config, because it’s concatenated at runtime with the server and database provided by the harness. To run the samples, you’ll need to have access to Northwind via a Windows-authenticated account or make changes to the connection string and provide a user login screen.

TIP: Differentiate between permissions for code generation and application testing, especially if your code generation writes to your database. Use separate accounts for code generation and application testing.

Code generation may also need to make changes to your databases. You might do this either because you’re creating stored procedures or because you’re creating and running scripts to build tables and/or import data. In these cases, you’ll need additional database rights. You’ll need to fine-tune your code generation login to have the required permissions.

TIP: You can mess up working with code generation in basically the same ways you can mess up in Enterprise Manager or Query Analyzer. You can run any script using the harness introduced in Chapter 3, including ones that overwrite stored procedures, add tables, drop tables, and so on. Just as you can “toast” your database with manual tools, you can toast it (faster and more efficiently) with code generation, so you’ll want to use backups, test databases, scripts, caution, and other good habits.  

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Working with Information Schema Views}

SQL-92 databases, such as SQL Server and Oracle, provide information schema views as a standard way to provide information about database schemas. These views are normalized, meaning that there are many of them. Information schema views are the best way to access structural information about your database. Accessing system tables directly isn’t a good idea because Microsoft might change the layout. The system stored procedure sp_help provides most of the information, but it’s more difficult to work with and it’s specific to SQL Server. So, the best approach is to use the SQL-92 information schema views.

Understanding SQL-92 Terminology

You’ll find several features in information schemas named in ways quite different from common SQL Server terminology. Table 2-3 shows the information schema and the corresponding SQL Server terms.

Table 2-3. Translating Terminology Between Information Schema and SQL Server Terminology

INFORMATION_SCHEMA

SQL SERVER

Catalog

Database

Schema

Owner

Routine

Stored procedure or function

Domain

User-defined type

RowVersion

Timestamp

Timestamp

DateTime data type

I’ll stick with SQL Server terminology because it sounds more familiar. The tool that extracts metadata from SQL-92 databases uses the underlying SQL-92 based table and column names to access the data, so you’ll need to be able to translate between the naming if you work with the internals of this tool.

Object names for tables, views, user-defined types, stored procedures, and so on are all defined via a three-part name in information schema views. This name consists of the catalog or database name, schema or owner name, and the object name itself. Most of you are likely to target one database and the dbo in your applications, so I’ll stick with simple object names. If you’re targeting multiple databases or multiple owners, you may need to make changes to the metadata extraction classes to avoid potentially ambiguous naming across different scopes.

Introducing the Specific Schemas

Information schemas are views, not stored procedures. This means you select from them rather than call them. Also, unlike most things you encounter in SQL Server, the information schema views have to be fully qualified, such as this:

SELECT * FROM INFORMATION_SCHEMA.SCHEMATA

SCHEMATA is one of the 20 or so information schema views provided in the SQL-92 standard. The SCHEMATA view returns the databases available to the user within the server instance. Table 2-4 summarizes what information each view provides. For the full information on these views and what each returns, refer to the “INFORMATION_SCHEMA” section in the TSQL help you can access though Query Analyzer (or MSDN).

Table 2-4. Individual Information Schema View Contents

VIEW DESCRIPTION COMMENTS
SCHEMATA Contains a row for each database in which the user has permissions
TABLES Contains a row for each table
TABLE_PRIVELEGES Contains a row for each table privilege
TABLE_CONSTRAINTS Contains a rom for each table constraint See the section “Using Constraints” for a discussion of how constraints work.

CONSTRAINT_TABLE_USAGE

Similar to TABLE_CONSTRAINTS with less information Not used in tool.
COLUMNS Contains a row for each column

COLUMN_PRIVILEGES

Contains a row for each column privilege
CONSTRAINT_COLUMN_USAGE Contains a row for each column constraint

COLUMN_DOMAIN_USAGE

Contains a row for each column that’s defined as a user-defined type. This information is also contained in COLUMNS Not used in this metadata extraction.
ROUTINES Contains a row for each stored procedure or function
PARAMETERS

Contains a row for each stored procedure or function parameter

ROUTINE_COLUMNS Contains a row for each column returned from a table value function Don’t get excited—this just applies to table value functions. I’ll show you later how to retrieve stored procedure recordset structures.
CHECK_CONSTRAINTS Contains a row for each accessible check constraint

REFERENTIAL_CONSTRAINTS

Contains a row for each accessible referential constraint
KEY_COLUMN_USAGE

Contains a row for each column used in an accessible key

VIEWS

One row for each accessible view

VIEW_TABLE_USAGE

Contains a row for each accessible table used in a view

Not used in tool.
VIEW_COLUMN_USAGE Contains a row for each column used in each view Not used in tool.
DOMAINS

Contains a row for each accessible user defined type

DOMAIN_CONSTRAINTS Contains a row for each constraint on an accessible user-defined type

In the SQL-92 data extraction tool, I include features you’re likely to use and don’t attempt to incorporate everything from the schema views. For example, I ignore aspects such as character set, collations, and so on. If you’re using these features, you’ll want to extend the tool with a derived class or change the source code.   

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Using Constraints}

Many of the information schema views are self-explanatory. However, the constraints are a bit confusing. Although column constraints are limited to check constraints, table-level constraints can be check, unique, primary key, or foreign key constraints. In all cases, the TABLE_CONSTRAINT view contains a row with the full name of a constraint in the Catalog/Schema/Name format and the constraint type. The constraint’s type indicates where to go looking for more information.

Retrieving Check Constraints

Check constraints are the easiest type of constraint to retrieve. The full name of the constraint in the TABLE_CONSTRAINT view points to an entry in the CHECK_CONSTRAINT view. This view supplies the check clause as text. That’s simple enough (see Figure 2-4, A).

dollard

Once your metadata includes the information from the check constraint of the SQL Server column, you can pass the constraint information out to a user interface tier to build its validation (such as a Web page that validates using JavaScript), to build middle tier validation, and/or build intelligent messages for your users on validation failures and other slick things.

Because you’re using the constraint in a different language than the TSQL that defines the constraint, you’ll have to parse to clean up the syntax. Dragons lurk in these waters. At the simplest, you’ll have to do some parsing to remove square brackets and adjust naming in a consistent manner, including replacing keywords and any spaces that snuck into your column names. This won’t be sufficient where SQL functions are used because .NET won’t understand these functions. It also won’t be enough if your properties don’t map directly to columns or the check is more complex. Problems generally get deeper as you look at UI-tier code where you also need to reference a middle-tier object within the constraint. It’s easy enough for you to see the intent in the following check constraint, but how do you turn this into UI validation code?

<dbs:CheckConstraint Clause=”([BirthDate] < getdate())” />

It’s actually a two-headed dragon that breathes fire from these depths. Attempting to handle all possible constraints eats too much development time. Handling some, but not all constraints, can result in lost constraints, which is very, very bad. It’s especially bad because missing constraints are hard to catch in testing. Invalid constraints that raise compile time errors are just fine—good in fact. You still have a dragon, but he has such a bad case of the hiccups he can’t sneak up on you.

Looking ahead to what generation actually does may clarify how you work with check constraints and the types of problems you’re likely to encounter. In the middle tier, you can just insert the text of the constraint after the square bracket, and name parsing is complete. This results in the following:

If Not (BirthDate < getdate()) Then
    ‘ Respond to Validation Failure

This assumes BirthDate is a property of the current object, which is likely because you are performing this validation within a middle-tier object. The compiler raises an error on getdate() because .NET doesn’t include this function, forcing you to add this method to your class. It’s easy enough to include a standard block of wrapper functions corresponding to common SQL functions in all of your generated classes. Alternatively, you can transpose these functions into framework functions during metadata extraction, or you can prefix all SQL functions with a specific class name such as SQLUtility.getdate() and add the methods to this class.

Compiler errors point you to the specific types of constraint validation used in your application. Be cautious to ensure that you never lose a constraint, even if you get something weird in the constraint. If something goes wrong, either output the original string (which you anticipate raising a compiler error) or something that’ll explicitly raise an exception such as the string Error Parsing Constraint.

TIP: Endeavor to parse constraints such that they’re fully complete and usable, or they’ll cause a compiler error.

Retrieving Primary Key and Unique Key Constraints

It’s only a little more convoluted to extract the unique and primary key constraints. Here, the constraint full name in the TABLE_CONSTRAINT view corresponds to one or more rows in the KEY_COLUMN_USAGE table, as shown in Figure 2-4, B. The KEY_COLUMN_USAGE table includes the full constraint name, the full table name (that you already had), and the column name and ordinal position, which are the two new pieces of information you need. There may be multiple columns as part of the primary key, and thus multiple rows in the KEY_COLUMN_USAGE view are possible. The KEY_COLUMN_USAGE view contains the columns for all the constraint types that use key columns, including parent and child constraints as well as primary key and unique constraints.

Retrieving Parent Relations

Relations between tables are called referential constraints and consist of a parent and child table. Child tables are also called foreign key tables, and parent tables are also called primary key tables. From the point of view of any single table, it can have any combination of parent and child relations.

The REFERENTIAL_CONSTRAINTS information schema view defines referential constraints. Each row in this view contains the full name to a parent table constraint and the full name of a child table constraint. The parent table is referred to in the view as the UNIQUE_CONSTRAINT and the child table simply as CONSTRAINT. You can retrieve both parent and child tables for any table from this view. In metadata, it’s best to include the relation definition as part of both the parent and child tables’ descriptions.

The foreign key relations of the TABLE_CONSTRAINT references the CONSTRAINT in the REFERENTIAL_CONSTRAINTS view. Effectively the current table is the child or foreign key table in the relation. The corresponding record in the REFERENTIAL_CONSTRAINTS view provides the full name of the parent constraint and table from the UNIQUE_CONSTRAINT fields of the view. With the full name of the parent constraint, you can grab the columns from KEY_COLUMN_USAGE in a manner similar to the primary and unique keys. This somewhat convoluted approach is illustrated in Figure 2-4, C.

TIP: It’s worth the trouble to add both parent and child relations to each table element in your metadata because it avoids convoluted XPath in your code templates. Convoluted XPath makes your code templates far less maintainable and far more sensitive to changes in metadata layout.

Child relations are all those relations with the current table listed as the parent table or UNIQUE_CONSTRAINT. To retrieve information on this constraint, you can search the REFERENTIAL_CONSTRAINTS view for all records with the current table as the UNIQUE_CONSTRAINT. Using the corresponding CONSTRAINT records, you can retrieve the table name and retrieve the keys for the child tables from KEY_COLUMN_USAGE. Figure 2-4, D, illustrates this relationship.

Retrieving Child Relations

Grabbing child relations is similar.   

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Modifying Mappings}

Sometimes the information in the database schema isn’t exactly the information you want to use. Maybe you need to use a different column name or a narrower type in the user interface. There are several ways to provide mapping. You can programmatically modify the structure, or you can use freeform data to override existing attribute values and rely on the merge mechanisms allowing later XML files to override data in earlier files. Chapter 6 shows a simplified ORM solution targeted at indirect mappings. If you’re just changing a caption or type, overriding data with freeform metadata may be sufficient.

Extracting Stored Procedures Schemas

You can extract the stored procedure and parameter information from the ROUTINES and PARAMETERS information schema views. What’s much harder to figure out is the intent of each stored procedure and what recordsets the stored procedure returns. The returned recordsets are important because these are often used to create business objects. Stored procedures are preferable to creating SQL statements as strings in your .NET code—also called dynamic or inline SQL. Inline SQL has significant problems, including security vulnerabilities, nonoptimal performance, and limited opportunities for reuse.

Determining Stored Procedure Intent

Knowing the intent of each stored procedure tells you how you’ll use it in code generation. Will it define a business object because it’s a select stored procedure? Will it be part of an updating strategy? Should it be totally ignored during code generation? You probably have some combination of nine kinds of stored procedures in your database as follows:

  • Simple select stored procedures that retrieve a single root record

  • Set-based select stored procedures to retrieve multiple records

  • Insert stored procedures

  • Update stored procedures

  • Delete stored procedures

  • Process stored procedures

  • Administrative stored procedures

  • Reporting stored procedures (select stored procedures used differently)

  • Miscellaneous stored procedures

Although there isn’t any difference in how you call them, I think it’s helpful to differentiate between simple select procedures that return a single record and procedures that return multiple records. Note that this distinction is on the number of root records returned, not on the number of recordsets. Simple select stored procedures will often return associated child collections. Depending on the amount of preprocessing that has to be done, you can sometimes gain considerable performance benefits returning these multiple recordsets together. I’ll differentiate these two types of stored procedures by calling them select and setselect stored procedures.

You’re probably familiar with retrieve, insert, update, and delete stored procedures. You may also have process stored procedures that make changes to the database or retrieve scattered information that can’t easily be considered a select and don’t map to underlying tables. Most databases have some administrative stored procedures that are used for importing data, updating structures such as maintaining an internal calendar, and so on. Code generation can generally ignore these utility and internal stored procedures. Reporting stored procedures aren’t much different from select stored procedures, except that they’re more likely to be denormalized and will almost always be read-only on the client. Few databases survive without a few stored procedures that are unique and strange to the history of that database. These are sometimes temporary or intended as temporary, and no one was quite sure they could delete them.

TIP Use your initials as a prefix for temporary stored procedures to clarify who is responsible for either deleting them or marking them as a permanent part of your database.

To generate code effectively, you have to know into which category each stored procedure falls. You can do this with either extended properties or an approach called careful naming discussed later in this chapter. Careful naming formalizes the stored procedure’s purpose within its name, allowing you to grab it during metadata extraction. You might be able to parse the body of simple stored procedures to guess what the stored procedure does, but I advise against that approach because it gets too ugly too quickly with nontrivial stored procedures.  

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Retrieving Stored Procedure Recordsets}

It’s tricky to retrieve the structure of recordsets returned by stored procedures. It’s not available from information schemas or from the system tables. To retrieve this information, you have to actually run the stored procedure and look at the output. If you look at Microsoft SQL Profiler, you’ll see that Visual Studio uses this approach to come up with the return values for Server Explorer.

SQL Server includes a SET FMTONLY ON switch, instructing it to return empty recordsets with no data. This works in most cases, primarily failing if you have temporary tables and a few other scenarios. It’s wise to also set the database to read-only prior to calling the stored procedure to minimize side effects from running these stored procedures or start and roll back a transaction.

The remaining problem is what parameters you’re going to use when calling the stored procedure. In the SQL-92 metadata extraction tool, I solve this by passing default values for the type, such as zero from numeric values. For this to work, obviously your procedure can’t blow up when default values are passed. I had one DBA include a consistently named optional parameter in all stored procedures that indicated whether I was extracting metadata. This Boolean parameter worked well, only requiring that one way or another the stored procedure return empty recordsets matching the runtime structure. The DBA could figure out internally the best way to do this. A similar approach would be to use a null value for one or more key parameters to indicate you’re in metadata extraction mode. If you can’t use these approaches, you’ll have to work out a scheme for storing the appropriate parameter values. You could store these in a separate file, store them in extended properties, or parse them from the opening comments of the stored procedure itself. Regardless of how you do it, this will be hard to maintain, so limit the number of parameters you need to assign values for code generation where possible.

Some stored procedures vary the structure of the recordsets they create based on parameter values. From a code generation point of view, not having predictable columns is disastrous. My first choice for a solution is to rewrite these stored procedures, perhaps having the stored procedures return empty data rather than omit columns when there are no values. My second choice is to create wrapper stored procedures for all the different variations. This way your code generation can access only the wrapper stored procedures, each of which returns consistent recordsets. If neither of these solutions works and your stored procedures need to return a variable structure, you’ll have to determine how that variability will affect your business objects.

In some cases, such as pivot tables, the recordsets produced by stored procedures is inherently variable. These stored procedures aren’t going to provide you with meaningful recordset metadata you can use to generate business objects. Your goal will be just to avoid misleading objects being built and develop other strategies, such as untyped DataSets for this kind of data.

Understanding ADO.NET Structures and Logical Tables

Running the stored procedures results in a set of ADO.NET DataTables. If you run the stored procedure with the MissingSchemaAction property set to AddWithKey, you’ll get a recordset that includes the column names, sizes, and types. You can access this information through normal ADO.NET techniques.

That’s nice information, but these returned columns almost always map to an underlying table and column. Previous sections showed how to extract even richer information from the underlying columns. To connect this information with the stored procedure recordset, you’ll have to determine the mapping between the stored procedure and its underlying tables.

Each returned recordsets is a logical table. Logical tables often map directly to physical database tables. They won’t map directly if the logical table consists of joined tables.

Inferring the Mappings

The mapping between the logical and physical tables can be explicit through extended properties or freeform data. But that’s often not needed because the mappings can be inferred. These inference rules will almost always get you the right column. When they don’t, you can override with manual mapping in freeform metadata. As part of evaluating your database for metadata collection, evaluate where you have potentially ambiguous fields.

Selecting stored procedures that return a single recordset mapped to a single table aren’t too hard. Because I love well-organized databases with names I can predict, I like the careful naming approach discussed later in the “Introducing Careful Naming” section to determine the physical table for selects that return a single recordset. For example, the stored procedure that retrieves data from the Customer table might be named acc_Customer_Select. Alternatively, you can use extended properties or freeform XML. You can apply extended properties to stored procedures and their parameters but not to returned recordsets or columns.

Things get a lot trickier when the stored procedure returns multiple record-sets or a recordset represents a table join. In the first case, you don’t know much about each recordset, and in the second case, individual columns in the recordset are mapped to different tables. To map multiple recordsets to their underlying tables, you can extend careful naming to columns by having a column name that’s the name of the table followed by ID, and using that column name to determine the physical table. This is probably how you already name your primary keys, so it’s an easy rule to implement; however, you also have to place this column in a predictable location, usually the first occurrence of a field ending with ID. You’ll also have to ensure that the recordset column names are the same as the underlying table’s column names. This solution works if there’s a single table mapped to each stored procedure recordset. Alternatively, you could store a comma-delimited list of table names as a stored procedure extended property.

When a recordset is comprised of joined underlying recordsets, you can gather a set of tentative recordsets by grabbing any column whose name is a primary key to another table. You can then walk these tables looking for each column name. Another approach is to prefix all of the column names with the corresponding table name and an underscore. This is the least ambiguous approach. It can be tedious to maintain by hand, but it’s a breeze if you’re generating the stored procedures. Yank these prefixing table names off the column name before exposing the data in a business object, which is also easy to do with code generation. Regardless of your overall approach, you need to use the concatenated table name approach when you have two columns that have the same name in different underlying tables with different metadata.

Working with SQL Server–Specific Metadata

Most of the information in your SQL Server database structure is available through the SQL-92 information schemas. Some of the information, specifically the autoincrementing identity columns and extended properties, are SQL Server–specific extensions and not included in the SQL-92 standard or the information schemas. In the metadata extraction tool, these values are retrieved in the derived SQLExtractMetadata class.   

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Retrieving Identity Columns}

Identity columns are SQL Server columns that auto-increment to ensure that each value is unique. These are often used to guarantee a unique primary key for tables. The type of each identity column is one of the integer types. Each identity column also has a start or seed value and an increment value. These default to int, 1, and 1. This metadata extraction tool just extracts which columns are identity columns from the type name returned by sp_help for the column.

Retrieving Extended Properties

The SQL function that returns extended properties can be a bit nonintuitive to use. Ultimately, it’s giving you a recordset of name-value pairs. To get these for anything in the database, it uses a rather convoluted concept of levels. SQL Server refers to these three levels as level0, level1, and level2 objects, but it’s easier to think of them as owner, object, and item. Objects can be tables, stored procedures, and so on, and items can be columns, parameters, and so on. For each level there’s a type and a name. You have to specify both the type and the name for levels above what you’re requesting. For the level you’re requesting, you can just specify the type to retrieve all extended properties of that type. If the first parameter is specified, it indicates the specific property to be returned. Thus, you’d retrieve the Caption parameter of the CompanyName column using the following:

SELECT * FROM ::fn_listextendedproperty(‘Caption’, ‘User’, ‘dbo’, ‘Table’, ‘Customers’, ‘Column’, ‘CompanyName’)

As an example of retrieving higher-level data based on partial parameters, the following returns all of the extended properties for tables in the database:

SELECT * FROM ::fn_listextendedproperty(null, ‘User’,
                          ‘dbo’, ‘Table’, null, null, null)

Note that I specified the user as ‘dbo’ at the highest level, but only the type at the current level (‘Table’).

Introducing Careful Naming

I settled on the phrase careful naming for this section because of my lack of comfort calling it anal naming, but it means about the same thing. Names within your database can carry the information you need to build metadata if you do the following:

  • Accept the premise that a name carries meaning.

  • Allow that meaning to be overloaded with detailed rules.

  • Commit to being very careful, almost fanatical, in your naming.

NOTE: Careful naming is most important for stored procedures.

Do your stored procedures already have names that indicate what they’re doing and what they’re doing it to? Most databases I’ve seen are awfully close to careful naming, except for two important aspects. They don’t use explicit clause delimiters, and they always seem to include a handful of exceptions. These exceptions really mess things up, and careful naming is really a commitment to follow a set of rules absolutely. It’s a 100 percent thing. If you rely on it for determining intent during metadata extraction and then do something in an unexpected way, you’ll generally have an extra or missing object, method, or property.

If careful naming doesn’t sound like a good fit for your workgroup, another option is to store the task and identify information for each stored procedure in extended properties of the stored procedures or external freeform XML that can be merged with your metadata. The benefit of careful naming is that everyone can see the intent quickly and find what they want even after stored procedures proliferate in your database. Chapter 7 shows how to use code generation to build stored procedures, and this makes it easy to follow careful naming rules. However, your DBA may already be quite good at making and following detailed rules.

Implementing Naming Delimiters

If you change your stored procedure naming to a careful naming scheme, the most important feature you’ll introduce is delimiters. Humans do very well with Pascal or camel casing because we gather meaning even when rules are broken.

Capitalization is a hard way to delimit meaning to a computer because humans are imperfect in applying it. You have no problem understanding the intent of these samples:

nwdCustomersSelect ‘ nonambiguous
nwdOrderDetailsSelectSpecialCase ‘ potentially ambiguous

versus the intent of these:

nwd_Customers_Select ‘ nonambiguous
nwd_OrderDetails_Select_SpecialCase ‘ never ambiguous

If you really hate underscores, you can work with casing. You’ll struggle with ensuring clarity to both the computer and humans, especially with multipart names such as OrderDetails. You’ll also need to avoid table names that contain your selected keywords such as Insert, Select, Update, Delete, or Process.

Implementing Naming Rules

The specific naming rules you use aren’t essential. They could be different from mine, as long as you can articulate and follow them. The key things you’re looking for are the intent (category) of the stored procedure, the subset of your database where it’s applicable if you have a large database, and specifics that allow you to make a meaningful unique name for your middle-tier classes.

The naming rules I use for stored procedures consist of four portions, or clauses separated by underscores. Careful naming relies on being able to non-ambiguously separate the clauses of the name—a task made easy by the underscore delimiters:

< subset abbrev >_< logical table name >_< type >_< specifier >

This format breaks down into clauses as follows:

subset abbrev: Designates major subsets of the database functionality, such as acc for accounting. These abbreviations should be three or possibly four characters. You can use a prefix such as utl (for utility) or prv (for private) to indicate those stored procedures that shouldn’t be exposed in metadata. This is a common way to organize categories of stored procedures, and you can use this information to organize your business objects into multiple assemblies if desired.

logical table name: Set of columns that generally map to a single business class and one or more physical database tables. You can also think of a logical table as any unique combination of columns. A single physical table might map to several logical tables that each containing a different subset of columns. You’ll also use different logical tables when you have joins producing different recordset structures. Each logical table needs a unique name—the table name of the physical table returned when there’s a one-to-one mapping.

type: Type of the stored procedure. This will be Select, SetSelect, Insert, Update, Delete, Process, Report, or Internal. Report stored procedures are generally Select stored procedures but are used differently (or not used) in metadata.

specifier: Additional information that also makes the stored procedure name unique. Depending on how you build your stored procedures and how complex your database is, there’s a good chance the other rules won’t be sufficient to generate unique naming. For example, you’ll frequently have specific variations of select stored procedures with different search criteria such as this:

nwd_ Customers_Select_ByAccountNumber

or this:

nwd_ Customers_Select_ByNameLastFirst

This last section of a specifier is optional and will probably appear in some, but not all, of your stored procedure names.

If you like to suffix your stored procedures with a consistent suffix such as _sp, you can also do that. The metadata extraction tools ignore this suffix.

If you put these rules together, the intent of each of the stored procedures in your database will be clear, and you can avoid the additional hassle of maintaining information about the intent of your stored procedures. If you aren’t doing this now and have a significant database with stored procedures calling other stored procedures or other applications using your database, changing your stored procedure naming is a nearly impossible job. In that case, you probably want to maintain this information in freeform data or extended properties.   

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Creating Freeform Metadata}

Freeform metadata is contained in XML files that contain anything you want to include for generating code. You’ll merge this information into other sections of data. Merging into existing sections is important because you need to access metadata without regard to its source. For example, if you need the caption for a column, you don’t care whether it was entered via an extended property or freeform metadata, and you certainly don’t want to rewrite your code generation templates just because you changed how you entered the captions. Thus, you don’t want to base the structure or organization of your metadata file on where information is coming from because it may change over the project life cycle. You accomplish this by using parallel structures in each metadata file and performing a merge that combines attributes within elements themselves—what I call an intimate merge. The “Merging Metadata” section discusses the mechanics of an intimate merge.

Creating Column Details

Because the major purpose of freeform metadata is to merge with other meta-data, its design is rather easy when it’s merged into an existing section. Freeform metadata files contain sections that mimic the structure of the metadata section into which you intend to merge it.

For example, the freeform metadata for adding captions, descriptions, and regular expression (regex) validation to database columns looks like the following:

<?xml version=”1.0″ encoding=”UTF-8″?>
<dbs:Databases xmlns:dbs=”http://kadgen/DatabaseStructure”>
  <dbs:Database Name=”Northwind”>
    <dbs:Tables>
      <dbs:Table Name=”Customers”>
        <dbs:TableColumns>
          <dbs:TableColumn Name=”CustomerID” 
             Caption=”ID” 
             RegEx=”AlphaNumericPunc”> 
        </dbs:TableColumn>
        <dbs:TableColumn Name=”CompanyName” 
           Caption=”Company” 
           Desc=”Company Name to appear on shipping labels”
           RegEx=”AlphaNumericPunc”>
        </dbs:TableColumn>

This allows a nonambiguous merge into the main DataStructure element and its descendants. Note that you can enter the same information using extended properties. I suggest for the sake of others that you adopt one approach consistently or at least be able to articulate what’s where.

Defining Stored Procedure Intent

As shown earlier, the intent of each stored procedure is important in understanding how it’ll be used in building your application. Each business object knows how to select its data from the database, update existing data, insert data for new instances, and delete data if appropriate. Mapping the intent of each stored procedure allows the business or data access objects to know what stored procedure to use for each operation. You can store this information in the procedure name using careful naming or enter it using extended properties. Like the column information, you can alternatively enter it via freeform metadata. If you have a naming system that works on many but not all of your stored procedures, the freeform approach works particularly well in overriding a few exceptions:

<dbs:Databases xmlns:dbs=”http://kadgen/DatabaseStructure”>
  <dbs:Database Name=”Northwind”>
    <dbs:StoredProcs>
      <dbs:StoredProc Name=”nwd_Customers_Select”
                                       ProcType=”Select” />
    </dbs:StoredProcs>
</dbs:Databases>

Providing UI Information

In addition to being able to build business objects, you’ll also see how to build user interfaces in Chapters 9 and 10. Some aspects of your user interface don’t have a good payback for code generation because there’s a lot of variation and because existing visual tools make manual layout of user interfaces easy. Other user interface elements—including utility forms, prototypes, and supporting elements such as menus—can be created via code generation.

Metadata samples in earlier sections were easy to design because they parallels existing data structures. The following example defines a role-authorized menu structure and is a different scenario because you have free rein in how you design the structure. You want to design metadata looking forward to the code generation process. If you’re designing a menu structure, when you look forward to code generation, you might think, “Ah, I’ll be creating a main menu that will contain nested menu items. Each item may have an authorization role associated with it, and each role has the potential to run a different method when the user clicks the menu item.” Articulating the goal leads to this:

<?xml version=”1.0″ encoding=”UTF-8″?>
<ffu:UIStucture xmlns:dbs=”http://kadgen/FreeFormForUI.xsd”>
  <ffu:WinApplication Name=”Main”>
    <ffu:MainMenu>
      <ffu:Menu Name=”MainFile” Caption=”File”>
        <ffu:Menu Name=”FileOpen” Caption=”Open”>
          <ffu:MenuAllow Role=”All” TypeName=”” MethodName=”FileOpen”/>
        </ffu:Menu>
        <ffu:Menu Name=”FileNew” Caption=”New”>
          <ffu:MenuAllow Role=”Managers” TypeName=”” MethodName=”MgrsFileNew”/> 
          <ffu:MenuAllow Role=”Admins” TypeName=”” MethodName=”FileNew”/>
        </ffu:Menu> 
      </ffu:Menu> 
    <ffu:MainMenu>
  </ui:WinApplication>
  <ui:WebApplication>
  </ui:WebAppliation>
</ui:UIStucture>

If you recall earlier discussions about metadata not including dynamic data that might be changed at runtime, you might challenge this example by asking whether the permissions change at runtime. In this particular mythical application, the permissions associated with each role are set in the design with only the members of each role changing at runtime.

NOTE:  In Appendix A, I suggest you stick namespaces on only the root node within your XML files. The exception to this rule is when child nodes have a distinct and self-contained purpose. In freeform metadata, each child node beneath the root corresponds to a distinct subset of data, and it’s appropriate to include name-spaces and their prefixes on the child nodes.

Managing Freeform Metadata Files

Consider how you want to manage your freeform metadata files. You’ll probably want to keep these XML files in a single centralized directory, and you’ll certainly want to keep them under source control.

Two considerations guide how you split up the freeform information into multiple freeform metadata files. You’ll see in the section “Merging Metadata” that the last file merged overrides any previous information if they collide. The order of merging files is important, and you can’t have some parts of a single freeform metadata file merge first and other parts of the same file merge last. Once you’ve split up files on this basis, consider splitting based on who will edit the files if your group has clear roles and different people will be editing different freeform metadata files—this makes it easier to manage source control. You can also split up metadata based on the major metadata header. If you aren’t clear at this point how to split up your freeform metadata files, then you can leave them as one file. The only reason to split it up is to make it easier to manage, and all other things being equal, one file is easier to manage than several files, particularly during maintenance.

CAUTION: From a strict technical perspective, it isn’t difficult to reorganize your freeform metadata files late in the project. From a strategic perspective, I discourage this. Because data from the last file wins on any collisions, you could have one or more very small changes, from a misspelled caption to more serious problems, arise because metadata changed as a result of the merge order changing. Also, it’s important to use source control on these files, and late reorganization makes it difficult to maintain a long-term history.   

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Using Skip Attributes}

One of the common things I hear when I talk to people who have experimented with code generation is that they found not everything should be treated the same way during code generation. Often, certain tables, columns, stored procedures, and so on should be skipped during generation.

You may be tempted to have a narrow, easy solution that one way or another marks metadata items to be skipped. That could just mean adding a Skip attribute to appropriate nodes. That doesn’t work because skipping often depends on the specific context. For example, in a stateful Windows.Forms application, exposing the TimeStamp is just confusing to later programmers, but the value may be passed to the browser and back in a stateless data flow pattern in ASP.NET. Allowing your metadata to become dependent on context greatly diminishes the value of metadata as an independent unit of information about your application. So I encourage you to develop a small set of attributes that specify the purpose of the category and avoid the Skip attribute altogether.

Table 2-5 gives you some ideas; I doubt you’ll want to use all of these, but you can select the ones that make sense for your project.

Table 2-5. Suggested Attributes to Manage Skipping Items During Code Generation

ATTRIBUTE

PURPOSE

Obsolete

No longer used by any operations

Private

Not exposed by objects in any context

StatefulPrivate

Only exposed in a stateful environment

StatelessPrivate

Only exposed in a stateless environment

Large

Ignored for tasks that don’t use large objects such as images

ReadOnly

Lets you provide a different class for read-only tables

ReportsOnly

Allows you to provide a different class design for reports

TIP: Don’t use generic skip attributes, but instead categorize elements by adding extra attributes and allow templates to skip as appropriate.

Using an XSD

Freeform metadata isn’t really a free-for-all. You don’t have a lot of flexibility in the structure because the merge process and code output will rely on the specific structure you create. XSD is the perfect tool for defining the structure, or schema, of your freeform metadata. This allows you to validate your XML before you use it and makes editing easier because Visual Studio .NET uses XSDs for IntelliSense, autocompletion, and warnings from the background compiler.

Validation is especially important when you’re using XPath, which you’ll use in all three types of code generation. As you’ll see in the Appendix A, XSLT trusts you completely and will raise no errors if something is missing, misspelled, in the wrong place, and so on. Important information might fail to be used. You can alleviate this problem by using XSD to validate any manually entered or edited data.

NOTE:  Hooking up XSDs for use by Visual Studio is covered in Appendix A. You can infer an XSD from an XML file using XSD.EXE directly or right-clicking the XML file in Visual Studio and clicking Create Schema. (See Footnote 13.) You can build a master XML file that contains all the elements and attributes you plan to use as a basis for your schema inference, or you can create a schema off the most complete file you happen to have and maintain the XSD by hand. If you find the XSD format intimidating, the approach of inferring your XSD from a master XML file works well. Just don’t use both approaches because you’ll wreck the edited XSD.

(Footnote 13. If your XML document runs into problems with the XSD.EXE generator, you might want to download the InferSchema tool from GotDotNet.com.)  

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Merging Metadata}

Merging metadata into a monolithic metadata file combines identical elements for seamless use in code generation. The merge process is complicated by one additional twist. The organization of the metadata—where things are found— should be independent from the metadata source. This means information from different sources can’t just be appended to the same file but must be seamlessly intermixed in an intimate merge.

In Chapter 3 you’ll see how to combine multiple processes, including merge processes, into one script. You’ll see how to use a script to determine the order in which files are merged. Doing this at the last minute allows you to handle the monolithic file as a transient and leave it out of source control. This is really helpful when you have several people doing independent code generation on different parts of your application or closely related applications. Even though the file is transient, you need a copy of the most recent version for debugging, so you’ll save it to disk.

There are two types of metadata files: those that contain a root element corresponding to a section in the output metadata file and those that contain a different root element that contains the attribute FreeForm=”true”. This small difference indicates a broader difference in the history of the files. Although it isn’t technically required, files corresponding to a single metadata header are programmatically created. Files where the root node wraps one or more child nodes corresponding to section headers are manually created as freeform metadata files. Including the high-level root, even when you’re editing a single section, allows you to add additional sections with minimal disruption. Combining several section headers in a single freeform file helps keep files from getting too small and scattered.

The merge is cumulative. It adds new child elements where they’re missing and new attributes to existing elements. If the same attribute appears for the same element in multiple merge source files, the last one wins and will overwrite the previous values. The expectation is that you’ll merge manual freeform meta-data last and allow it to overwrite any values extracted programmatically. This allows you to smoothly override values, but be sure you debug against the monolithic metadata file.

CAUTION — Because each metadata file can overwrite values in other metadata files as they’re merged, use a copy of the monolithic file to check input during debugging.

The merge process is short and recursive, so I’ll walk through the code. The MergeFreeForm class has a single public method that accepts an XML output document and a filename. It loads the file and checks the FreeForm attribute. If it doesn’t exist or doesn’t have a value of true, then the root is treated as the metadata section header. If the attribute is true, then the second-level nodes are treated as the section headers. This is the only difference between the two types of source files:

Public Class MergeFreeForm
    Public Shared Sub Merge( _ 
                  ByVal outDoc As Xml.XmlDocument, _ 
                  ByVal fileName As String)
       Dim mergeDoc As New Xml.XmlDocument
       Dim rootNode As Xml.XmlNode
       mergeDoc.Load(filename)
       rootNode = mergeDoc.ChildNodes(1)
       If Tools.GetAttributeOrEmpty(rootNode, “FreeForm”) = “true” Then
          ‘ If its a freeform file, regardless of the root element name,
          ‘ attempt to merge all child nodes of root
          For Each node As Xml.XmlNode In rootNode.ChildNodes
              MergeNode(outDoc.ChildNodes(1), node, Nothing)
          Next
       Else
          MergeNode(outDoc.ChildNodes(1), rootNode, Nothing)
       End If
End Sub

The MergeNode method is recursive. It searches the target file for an element matching the source node’s element name and a matching value of the Name attribute (if the Name attribute is present). This step is simple and unambiguous because it assumes precisely the same structure in the source and target node. If the SelectSingleNode method doesn’t find the node already in the target, the node is added. If SelectSingleNode finds the node, MergeNode checks the target node for the presence of each attribute from the source node. If the attribute doesn’t exist yet, it’s created. If the attribute exists, its value is updated:

Private Shared Sub MergeNode( _
ByVal outParent As Xml.XmlNode, _
ByVal node As Xml.XmlNode, _
ByVal nameAttrib As Xml.XmlAttribute)
Dim nsmgr As New Xml.XmlNamespaceManager(outParent.OwnerDocument.NameTable)
Dim predicate As String = “”
nsmgr.AddNamespace(node.Prefix, node.NamespaceURI)
If Not nameAttrib Is Nothing Then
predicate = “[@Name='” & nameAttrib.Value & “‘]”
End If
Dim testNode As Xml.XmlNode = outParent.SelectSingleNode( _
node.Name & predicate, nsmgr) If testNode Is Nothing Then outParent.AppendChild(outParent.OwnerDocument.ImportNode(node, True))
Else
‘ Node exists, add attributes, then children
For Each attrib As Xml.XmlAttribute In node.Attributes
If testNode.Attributes(attrib.Name) Is Nothing Then testNode.Attributes.Append(xmlHelpers.NewAttribute( _ testNode.OwnerDocument, attrib.Name, attrib.Value))
Else
testNode.Attributes(attrib.Name).Value = attrib.Value
End If
Next

CAUTION — Attribute values are overwritten. The last value wins.

Finally, MergeNode recursively calls itself for all child nodes. If you aren’t familiar with recursive algorithms, you might have to look at this a couple of times, but it’s a nice example of a simple, effective recursive algorithm. You’ll see recursive algorithms frequently in this book because they’re so handy when you’re working with nested data such as XML documents:

         For Each childNode As Xml.XmlNode In node.ChildNodes
            MergeNode(testNode, childNode, childNode.Attributes(“Name”))
         Next
      End If
   End Sub
End Class

Namespace usage might be a little confusing here. I’m assuming you’ll prefix everything, including your freeform data. I anticipate you may use different namespaces for different sections of your metadata. Thus, the name-space manager is created for the current node. Namespace prefixes are discussed in Appendix A . The key factor here is that the XPath expression’s prefix will be from the node you’re searching for—from the source document you’re merging from.

It doesn’t matter whether this prefix matches the target prefix as long as the namespaces pointed to by both prefixes match. This code will work just fine even if you vary the prefix you’re using for a specific namespace. But don’t do that! A major problem will arise if you add any new elements. These new elements will have a prefix from the source and won’t be understood in the context of the target document. You could work on solving this problem, but I can’t imagine a reason you’d use different prefixes for the same namespace in different source documents. It’d be really hard for a maintenance programmer to keep up with the mental calisthenics of switching from one prefix to another.

TIP Use consistent namespaces.

Summary

Extracting metadata is the process of gathering information from metadata sources and organizing it as an XML file that’s friendly to code generation. Available types of metadata include the following:

  • Schema definitions such as XSD

  • Databases such as SQL Server
  • Web Services via WSDLs

  • Manually entered freeform metadata

  • External sources, such as mainframes, Excel, and so on

  • Existing applications and source code

  • Design tools such as UML

If you’re working with XSD, it’s helpful to convert the XSD to friendly XML prior to code generation.

To extract from SQL Server, Oracle, or another SQL-92 compliant databases, use the SQL-92 information schema views. Extracting most metadata from databases is straightforward, but stored procedure intent, recordset naming, and the mapping between columns and the underlying data columns are difficult. Going to the trouble of extracting this information provides richer information for code generation. You can get this information via careful naming and inference, extended properties, or manually entered freeform data.

You can use freeform metadata to include any additional metadata. This freeform metadata can be manually entered or programmatically derived. In the intimate merge of metadata described in this chapter, files that merged later can overwrite previously merged files. This allows you to incorporate manual overrides late in the merge process.

Additional Reading

To find more information about the topics covered, try the following resources:

  • In addition to covering Web Services , Building XML Web Services for the Microsoft .NET Platform by Scott Short (Microsoft Press, 2002) contains some concise sections on other X-Stuff subjects, including XSD.

  • Professional UML with Visual Studio .NET by Andrew Filev, Tony Loton, Kevin McNeish, Ben Schoellmann, John Slater, and Chaur G. Wu (Wrox, 2002) discusses using UML with Visual Studio and discusses Visio’s limited code generation capabilities.

  

This is from Code Generation in Microsoft .NET, by Kathleen Dollard (Apress, ISBN 1590591372). Check it out at your favorite bookstore today. Buy this book now.

[gp-comments width="770" linklove="off" ]