For software developers who desire the easiest way to get a job done well, one-click code generation is like water travelling around rock. This chapter provides a preliminary look at a tool, the harness, to manage the multistep process of code generation. (From the book Code Generation in Microsoft .NET by Kathleen Dollard, Apress, 2004, ISBN: 1590591372.)
Outputting Code Principle #3: You, or someone unfamiliar with project, can regenerate your code precisely as a one-click process—now or at any point in the future.
Once you've prepared your metadata, the next step is to build templates and perform code generation. Your initial development and long-term maintenance both benefit from repeatable one-click code generation. You benefit from operational simplicity because you can largely forget about code generation details and template content during development. A maintenance programmer benefits from operational consistency because they can repeat your actions with little effort. The keys to this repeatability are placing as much code as practical in templates and using a tool to perform the multiple steps of code generation.
Managing the complexity of sophisticated code generation is one of the key solutions this book provides. You can manage this complexity by combining the following three approaches:
Splitting code generation into five distinct steps with five corresponding principles (covered in Chapter 1)
Using a tool to manage the multistep process of code generation (covered in this chapter)
Building templates that reflect best practices (covered in Chapters 6–10)
NOTE: Building templates to hold the structure of your code is trivial for trivial cases. When it looks complex, keep in mind that I’m unwilling to relegate code generation to simple or trivial cases. I believe your code generation techniques should be able to handle any coding problem you choose to address and produce the code you want. My job is to tell you the possibilities; you get to tune in the level of complexity that makes sense for your project by using as much or as little as you want.
This chapter provides a preliminary look at a tool, the harness, to manage the multistep process of code generation, and Chapter 5 and Appendix C go on to cover this tool from different perspectives. Specifically, Chapter 5 contains more detail on using the harness, and Appendix C discusses its code.
Once you’ve seen how you’ll perform code generation via the tool, you can focus on how you create individual templates. The bulk of this chapter explores template creation, covering aspects of brute force, XSL Transformations (XSLT) templates, and the CodeDOM in additional detail.
Understanding the Elements of the Code Generation Harness
One of the great parts of software development is that it thrives on a basic human trait—laziness. This isn’t irresponsibility, carelessness, or a lack of personal hygiene. It’s looking for the easiest way to get a job done well. You can work hard to cut corners in a responsible manner, and your source code often thrives as a result. Code reuse is a classic example, but object-oriented programming, code generation, refactoring, and other techniques benefit from the same trait.
You’ll perform a large number of steps in code generation. You’re likely to have the following:
Several extraction, morphing, and merging steps to prepare your meta-data, as discussed in Chapters 2 and 6
Five templates to create stored procedure scripts (Create, Retrieve, Update, Delete, and SetSelect)
An installation step for each stored procedure script you generate
At least four middle and data tier templates (generated objects, generated collections, editable objects, and editable collections)
Up to ten separate User Interface (UI) templates
Rather than individually running these steps, you can build on your responsible laziness and create a one-click approach to code generation that easily accomplishes all of these steps. The frightening alternative is code generation that follows a two-page checklist. To transition from that two-page checklist to one-click processing, you need a code generation harness that processes a script consisting of individual instructions called directives.
Avoiding Too Many Steps
Running a script lets you weave dozens of code generation steps together and run them with a single mouse-click. This lets you precisely repeat the generation process with a single click. Code generation must be repeatable. You’ll want to easily repeat it next week, and someone marginally familiar with the language and application may need to repeat it in three or four years to incorporate a database or business rule change. If they can’t do that, they’ll cut corners and hack around your solid design. Now, the real nightmare: A programmer changes a generated file directly when trying to get a job finished. Eight months later, someone else regenerates and introduces potentially catastrophic bugs in a part of the application no one expected to change—so it wasn’t even in the test plan! Code generation doesn’t have to be like that.
If you’ve played with the ADO.NET strongly typed DataSet, you’ve experienced the opposite of one-click generation—the stepwise process of generating code that can turn into a two-page checklist. First, you drag the tables or stored procedures onto the design surface. This generates an XML Schema Definition (XSD). You manually add relations and might adjust name mappings or add annotations. These changes modify the XSD. Then you select Generate Dataset from the context menu, and code generation creates the strongly typed DataSet. You have to do this for every DataSet—most of the steps each time you regenerate!
Forget that! Your goal is one-click generation. You put a lot into building your application. The scripted harness preserves your important choreographed steps through rounds of even not-so-bright maintenance programmers. Listen to your laziness gene and ensure that you’re using a one-click approach. That’s the only way to build stable code generation. Then ensure that the harness and everything it accesses is a permanent part of your project. Let your script and templates change for new projects, but split off them off as a semistatic part of each project. Change them only if needed because of a conscious decision and as part of the project’s evolutionary cycles, backed by testing. It’s not just convenience; it’s essential. Planning for the long term includes intelligent one-click code generation based on a script.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
The script you’ll use for one-click code generation is an Extensible Markup Language (XML) file that contains a series of directive elements. Each directive is an XML element that describes a single, atomic step run by the code generation script. You can create and edit the script file through the harness or through an XML editor. To support editing in either manner, the XML script used by the harness follows an XSD schema that you can extend.
NOTE: Appendix A shows the XSD schema used by the harness script as an example. This lets you look at its details, even if you aren’t already familiar with XSD.
Each directive includes many attributes that specify different aspects of processing one atomic step. Child elements of the directive logically group attributes based on what they relate to. For instance, there’s a set of attributes related to processes that produce multiple files. The MultiPass child element contains selection and naming criteria that can be used by XSLT generation, brute-force generation, CodeDOM generation, and SQL scripts. This grouping of attributes within child elements allows abstraction of the child element definition for use by multiple directives, significantly easing the maintenance burden for the XSD. For example, the basic information required of all directives is contained in a child element named Standard that contains attributes such as a human-friendly Name. It also includes a Checked attribute indicating whether the directive is checked in the TreeView of the harness, where all checked elements are processed.
Groups of directives are called sections. Sections let you group processing based on any convenient criteria. You might group directives by the generation step they correspond to or by the processing performed by different people, or you might group directives to provide alternate scripts for testing. You can turn processing on and off for entire sections when you run the generation harness. This is an example of a single section with a single directive and three child elements:
In this sample, the directive is CreateMetadata. The CreateMetadata directive and the Section each contain Standard child elements that contain the name and checked status. The checked status indicates whether the item appears checked in the TreeView and whether that step will run as part of code generation. The CreateMetadata directive contains two additional child elements, each of which contains a logical group of information required for processing. There are about ten directives for full-featured code generation, as well as several groupings of attributes in child elements. Chapter 5 discusses specifics of each directive and child element.
Using the Code Generation Tool
The code generation tool does only two things. It puts a friendly face on editing XML, and it performs code generation by executing your script. To perform code generation, the tool executes code based on the XML directive elements that make up your harness script. This code may be in an external assembly you reference in a script directive.
The tool’s XML editing feature is generic and based on the XSD. The attributes of the XML harness script become controls on the harness’s form, and the child elements become grouping boxes.
NOTE: I don’t promise it’ll edit any XML file you throw at it, but it’ll edit XML you need for scripting code generation directives and may edit other XML files if you have XSDs for them. It’s an object-oriented design, so if you have a similar need, you can customize this form through inheritance. The harness user interface also uses some cool dynamic techniques you can walk through in Appendix C.
You start the tool by double-clicking CodeGenerationHarness.exe in the Tools subdirectory. (footnote 1) Selecting File -> Open lets you open a script such as Harness3.xml in the Chapter 3 subdirectory. When you open this file, you’ll see a TreeView containing sections and directives. As you select different nodes, that node will appear for editing in the right pane. Figure 3-1 shows the user interface for this tool. From the File menu you’ll be able to do familiar actions such as save and create new files. New files are created based on the XSD. The standard XSD for code generation is called KADDrivingMetada.xsd and is in the XFiles subdirectory of the book’s code directory.
Figure 3-1. The code generation harness with a sample XML file open to an XSLTGeneration directive
When you click the Run button, all the currently checked directives are processed. After processing is complete, the Result tab contains any errors or warnings that occurred during processing.
Footnote 1. Feel free to create a shortcut and put it on your Desktop or Start menu. The downside of XCopy deployment is that this doesn’t happen automatically.
NOTE: If you modify XSLT files and save the updates, code generation will use the updated files the next time it runs, regardless of whether the harness has remained open. However, you won’t be able to compile assemblies that you reference as processes until you close the harness. This happens because the assemblies remain loaded. You have to stop and restart the harness when you make changes to brute-force or CodeDOM templates.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
An overview of harness features will help answer questions that may arise as you explore template details later in this chapter. For example, in the later discussion on using template parameters, you might wonder how context-sensitive parameters are passed from the harness. Information on harness features will also let you use the generation harness to test the templates in this chapter. This overview is simplified, and Chapter 5 offers additional information on these features.
Working with Abstract File Paths
The generation harness uses abstractions for file paths and names. This allows the same script to be used to start another project by changing only path definitions. Many of the file paths appear multiple times within the script, and this also provides you with a straightforward way to update all occurrences quickly. Square brackets surround abstract FilePath filenames, such as the BasePath in this code:
The FilePath elements of each harness script contain the definitions for the abstract file paths used by that script. You can nest abstract paths and use multiple abstract paths in a sequence.
NOTE: File paths are relative to the harness script file location. This allows users with different file configurations to use the same harness, as long as the project directory has the same organization.
Providing Parameters
The harness retrieves the parameters accepted by templates during the generation process. It does this via reflection for brute-force and CodeDOM templates and by running an internal stylesheet against XSLT templates.
NOTE: To retrieve parameters from an XSLT stylesheet, the stylesheet is opened as the input XML document, and an internally stored XSLT stylesheet is run against the template stylesheet. The internal XSLT retrieves all <xsl:param…> elements that are children of the <xsl:stylesheet…> (or <xsl:transform…>) element in the template stylesheet.
CAUTION: Parameter names are case sensitive.
The harness itself provides three special parameters. If your template accepts any of these parameters, the harness will pass the values. The three special parameters are as follows:
fileName passes the name of the output file. This would generally be used in a header comment.
genDateTime passes the date and time the generation is to perform. You could use this in a header comment, but it causes files under source control to be checked in every time you run code generation, so this isn’t recommended.
nodeSelect passes the current node of the file. This isn’t valid for XSLT processing, but it’s important in brute-force and CodeDOM generation to determine the XML input file context for the code generation (which node you’re processing).
In addition to these special parameters, the harness checks the node it’s processing in the input XML metadata for attributes matching the names of any additional parameters your template accepts. You’ll usually pass something identifying the current node, such as the name of the current table or object.
CAUTION: The match between parameter names and the attribute name in the XML input node is case sensitive.
TIP: Pass identifying values as parameters that allow the template to find the node to process in the metadata. If the SelectFile is the metadata file (the normal case), the template will be processing this node. It’s generally easier to retrieve additional values from the metadata than pass them as parameters. (The “Looking at the Harness Script” and “Introducing the Target and Metadata Files” sections explain this in more detail.)
CAUTION: No error occurs if attributes matching your parameters aren’t found.
Overwriting Files
Managing file overwriting is a critical part of code generation. It’s important to overwrite the correct files without overwriting the wrong files, particularly those you’ve modified by hand. Overwriting rules are defined in the harness script, so you don’t have to make any decisions during generation unless something goes wrong. In Chapter 5, you’ll see how to manage file overwriting through the OutputGenType attribute of the OutputRules child element. This allows you to specify whether to overwrite files and log errors if the harness encounters manually edited files. This feature ensures you can meet the goal of providing protection for manually edited files.
To provide this level of control, the harness needs to know whether a programmer has manually edited each file. You might think about using a date/time stamp, but in addition to the ugliness of tracking it across time zones, it isn’t reliable, especially because source control frequently changes the file time stamps. A better solution is to place a hash marker within the file header comments. A hash marker is a numeric value that has an extremely high probability of uniquely representing the underlying value—in this case representing the file’s text. The initial hash corresponds to the original file. If a hash calculated later doesn’t match, then someone has edited the file. Chapter 5 explains hash markers in more detail.
The hash marker is the weird sequence of characters at the top of a file; if you edit or remove them or the surrounding comments and markers, you’ll get code generation errors. This is an example of a hash marker:
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
This book is the start of your code generation adventures, not the sum total of them. There’s no way I can predict what you’ll need to do; there isn’t even a way for you to predict what you’ll need to do. The provided directives do everything I thought of; I trust your creativity, so I designed the harness with extensibility in mind. You can incorporate new templates using all the current harness capabilities (generating output from templates, collecting metadata, running processes, running SQL scripts, and so on) without extending the harness. If you need more capabilities, you can extend it. I provided this feature so you could incorporate every step of code generation as part of your one-click code process, even if I didn’t provide the directive. To create new directives, you’ll need to do the following:
Design the XML for the directive (what attributes does it need?)
Create a .NET assembly that performs the task in a static (Shared) method
Specify the assembly, type, and method name in the XSD along with any other information you defined
Add this directive’s definition to the XSD
Create a new entry in your XML harness script for the new item
That’s it. If your method doesn’t run, you may need to look up (in the .NET Help) how to attach a debugger to an executing process to debug your code. The problem will probably be with the parameters passed. In designing the schema for your extension, you can use any existing child element—all of which are defined via XSD-named complex types. (Appendix A further discusses named complex types and simple types and how to restrict entries to a list.) You can also create new named complex types. Combo box entries are simple types restricted to a list. You can add new items to existing lists or create new lists.
CAUTION: Changes to your XSD won’t affect your editing in Visual Studio unless the updated XSD is in the project directory or in the magic directory [Program Files]\ Microsoft Visual Studio .NET 2003\ Common7\Packages\schemas\xml and you restart Visual Studio. Appendix A has more information about doing this.
Dynamic Techniques
Does it seem odd that a key tool in a book on code generation doesn’t use code generation but instead uses its wild and crazy sister, data-driven dynamic techniques? Code generation just wasn’t the easiest way to build the code generation harness. A few reasons why dynamic techniques are a better fit for this tool are as follows:
The application is simple, just displaying the attributes for XML nodes and cycling through these nodes for processing.
Performance isn’t an issue.
A sophisticated user interface isn’t needed.
The interface can be inferred from available information (the XSD).
The structure of the user interface is under external control (the XSD, not the harness).
The task of the application is fixed, so there’s no reason, other than XSD changes, to compile and deploy the application.
Maintenance programmers already need to understand reflection because of the way templates run.
NOTE: If you’re interested in how I used dynamic techniques to build this tool, refer to Appendix C. It walks through the code, explaining the dynamic techniques.
Looking at the Harness Script
You’ll use scripts containing directive elements when you run the harness. I’ll discuss details of scripts and directives in Chapter 5, but a look at the XML illustrates how the XML directives control code generation. The XML corresponding to the directive displayed in Figure 3-1 is as follows:
The Standard element contains the information common to all directives. The OutputRules contains overwriting information. In this case, code generation creates a hash and overwrites any existing file, unless a human edited the file.
The MultiPass child element contains information about how multiple output files are created based on a single directive. The SelectNamespace and SelectNSPrefix support the SelectXPath in selecting nodes. The SelectFile attribute tells the harness which XML file to use in selecting nodes. Code generation creates files in the OutputDir using the OutputFilePattern. This pattern is XML-like so uses angle brackets to define replacement tokens in the filename. However, you can’t directly output the less than (<) sign via XSLT, so this attribute uses the escaped version < and >. Examples of the resulting filenames are Customers.vb and Project.vb.
The XSLTFiles element contains information about the XSLT transform. This includes the input file and the XSLT transform filename. Brute-force and CodeDOM sections of the script use identical Standard, OutputRules, and MultiPass child elements. Instead of XSLTFiles, the brute-force generation uses the Process directive. With the Standard, OutputRules, and MultiPass child elements represented by ellipses, the brute-force directive is as follows:
NOTE The Process directive uses the AssemblyFileName attribute if the class is in an external assembly and uses the AssemblyName attribute in the less common case where the class is part of the harness.
Although these directives may initially seem complex, they provide a great deal of consistency between different directives, which makes maintenance easier. Even if you’re sticking with one generation mechanism (recommended), you’ll use multiple directives because you’ll create metadata, merge metadata, transform metadata, generate code, run SQL scripts, and so on.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
Now that you’ve seen an overview of the harness you’ll use to run code generation, you’ll take a close look at the templates that output code. Regardless of the mechanism you’re using, I call what holds your code pattern a template. When you’re using brute-force or CodeDOM generation, the template is a .NET method; when you’re using XSLT code generation, the template is an XSLT stylesheet.
The following sections look at how you work with the templates in each of the three techniques for generating code. I approach each technique differently because the core mechanism differs. In XSLT code generation, I leave the XSLT basics for Appendix A and explore how to take sample code and convert it to an XSLT template. The section “Exploring Details of Brute-Force Generation” is slim because the beauty of this technique is that it’s simple and uses techniques that are probably already familiar to you. Working with the CodeDOM is complex, and although that section just flutters across the surface, that section is long, complex, and won’t be relevant to all readers. There’s just so little information on using the CodeDOM available today that the section seemed necessary for those readers who will use, make a decision on using, or simply want to understand the CodeDOM. Appendix D has additional details on issues you may encounter if you work with the CodeDOM.
NOTE: You may find creative ways to combine these approaches, such as your own tokenizing scheme. I keep the approaches separate to show the fundamentals of each. You can pull from each if you’ve devised a hybrid approach. I’m not a fan of tokenizing schemes because they don’t have the power of XSLT, so they’re hard to use in complex code scenarios.
Introducing the Target and Metadata Files
Metadata files are an XML description of your application’s data. Target files are sample source code files that look like what you want to generate. You’ll build templates that output code files that match your target file using XML metadata as input. You’ll compare the output with the target as an initial test of your templates.
For reference, Listing 3-1 shows part of the XML input file used in this chap-ter’s XSLT and brute-force code generation examples. This XML describes a single data table and is part of a larger metadata file containing descriptions for each table, stored procedure, view, function, and user-defined data type in the database. I removed privilege information and replaced the details of most columns with ellipses to shorten the listing. The Table element and each TableColumn element contain many attributes. These attributes provide details about the table or column you’ll use during code generation.
Listing 3-1. Part of the XML Input File Used in This Chapter
Listing 3-2 shows portions of the target code file generated from the meta-data shown in Listing 3-1. Ellipses indicate where I clipped redundant sections. I purposely showed only two columns. You can see the rest in the download or mentally extrapolate.
NOTE: This sample contains both a class for the collection and the individual objects in the same file to illustrate more code generation techniques. You can include related classes in the same file, or you can create a separate file for each class when you generate code as shown in Chapter 8. This sample is deliberately different from the example in Chapter 8. This gives you template examples that create a greater variety of output code.
Listing 3-2. The Target File Used to Create Templates for All Three Code Generation Types
Option Strict On Option Explicit On
Imports System Imports KADGEN Imports System.Data
#Region "Description" ' Orders.vb #End Region
Public Class OrderCollection Inherits CollectionBase
#Region "Constructors" Protected Sub New() MyBase.New("OrderCollection") End Sub #End Region
#Region "Public and Friend Properties, Methods and Events" Public Overloads Sub Fill( _ ByVal OrderID As System.Int32, _ ByVal UserID As Int32) OrderDataAccessor.Fill(Me, OrderID, UserID) End Sub #End Region
End Class
Public Class Order Inherits RowBase
#Region "Class Level Declarations" Protected mCollection As OrderCollection Private Shared mNextPrimaryKey As int32 = -1
Private mOrderID As System.Int32 Private mCustomerID As System.String ... #End Region #Region "Constructors" Friend Sub New(ByVal OrderCollection As OrderCollection) MyBase.new() mCollection = OrderCollection End Sub #End Region
#Region "Base Class Implementation" Friend Sub SetNewPrimaryKey() OrderID = mNextPrimaryKey mNextPrimaryKey -= 1 End Sub #End Region
#Region "Field access properties"
Public Function OrderIDColumnInfo As ColumnInfo Dim columnInfo As New ColumnInfo columnInfo.FieldName = "OrderID" columnInfo.FieldType = gettype(System.Int32) columnInfo.SQLType = "int" columnInfo.Caption = "" columnInfo.Desc = "" End Function
Public Property OrderID As System.Int32 Get Return mOrderID End Get Set(ByVal Value As System.Int32) mOrderID = Value End Set End Property
Public Function CustomerIDColumnInfo As ColumnInfo Dim columnInfo As New ColumnInfo columnInfo.FieldName = "CustomerID" columnInfo.FieldType = gettype(System.String) columnInfo.SQLType = "nchar" columnInfo.Caption = "" columnInfo.Desc = "" End Function
Public Property CustomerID As System.String Get Return mCustomerID End Get Set(ByVal Value As System.String) mCustomerID = Value End Set End Property ... #End Region End Class
The following sections show how to create this file via XSLT and brute-force code generation. These example will show how to output Visual Basic .NET (VB .NET) code. The Apress Web site includes parallel examples that output the code in C# files. Chapters 4 and 6 have examples outputting Structure Query Language (SQL) stored procedures.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
XSLT can output any type of text, including code in any language. You saw an introduction to how this works in Chapter 1, and this example dives deeper, including showing template organization and conditional code segments.
NOTE: XSLT code generation relies on XSLT, so if you’re unfamiliar with this technology, you’ll want to review Appendix A.
Creating a Class
The best way to learn about XSLT code generation is to convert some sample code into an XSLT template. There’s a custom version of Order.vb in the XSLTExample folder of the Chapter 3 code file. I’ll convert this sample file into a working template and test it across the tables of the Northwind metadata. You may want to fire up Visual Studio and walk through this process as you read the text.
Open a new stylesheet and name it SimpleDataContainer.xslt. Create the folder in the Chapter 3/Test directory. This directory already exists if you unzipped the code. If you have updated the default stylesheet as described in Appendix A, you’re ready to go. If you didn’t, you’ll need to add the xsl prefix to the namespace, add the preserve space element, add the entry-level template, and update the stylesheet to reflect the xsl prefix. The metadata file created with the metadata extraction tool (discussed in Chapter 2) uses the dbs prefix, so you’ll also need to add this namespace and prefix. Before you start, your stylesheet should contain the following (bolded items represent changes from the .NET default):
I’m using a supporting stylesheet imported as a separate file. This stylesheet contains utility templates that provide code reuse within XSLT. The processor has to find this utility template. You’ll need to modify this path if your directory layout differs from that in the download. I used relative paths to simplify moving your project.
XSLT provides both xsl:import and xsl:include mechanisms for adding XSLT files. They have different behaviors when template names conflict. You want to avoid conflicting template names to keep others sane, even if XSLT is flexible on the issue. Import has behavior that’s more predictable if you accidentally create a conflict, which is why I use it, but it has to be the first XSLT element in the file.
Specifying XSLT Parameters
This template, along with many of the templates you’ll build, takes three parameters: the output filename, the date/time of file generation, and the name of what you’re generating. The earlier “Providing Parameters” section explains how the code generation harness supplies parameter values.
Within the XSLT template, you retrieve parameters using this:
You can access these parameters from any template within the stylesheet, including templates in the XSLT file accessed using xsl:import.
Creating the Entry-Level Template
The purpose of the entry-level template is to start processing the stylesheet. All other templates in the stylesheet should have either a name or a mode to clarify their purpose. The entry-level template is brief and just shifts the context within the XML metadata file to the node you’re processing. In this entry-level template, the template processes table nodes with a Name attribute that matches the Name parameter passed:
The xsl:apply-templates directive indicates that you want to process any template matching the select criteria. The XSLT engine processes every matching node with every template applicable to the dbs:Table node that also has a mode of BuildClasses (there’s only one). If the XPath statement in this code doesn’t make sense to you, review the XPath section of Appendix A.
Creating the High-Level Processing Template
When doing code generation, it’s easiest to have the entry-level template process (or run) a single high-level processing template. The high-level processing template lays out the structure of the output file. I came up with the terms entry-level template and high-level processing template to make it easier to describe the purpose of these templates. You’re unlikely to find them used in other documentation.
The high-level processing templates make extensive use of additional templates. Templates act as subroutines if called using the xsl:call-templates directive. When called using the xsl:apply-templates directive, they provide what’s ultimately a looping operation. In XSLT, the subroutine style templates called using the xsl:call-templates directive are called named templates. The looping style templates called using the xsl:apply-templates directive are called match templates. It makes no difference whether these templates exist in this XSLT file or the files accessed via xsl:import or xsl:include (see footnote 2) because these additional XSLT files become part of the main template during processing.
Footnote 2. It makes no difference whether you access supporting stylesheets via xsl:import or xsl:include unless there’s a conflict. You can check the MSDN Help for these xsl elements for more information about how conflicts are resolved when you use them.
Just as in procedural programming, you want to know what each template does, so keep the focus of each template as narrow as practical and describe what the template accomplishes in its name or its mode. The high-level processing template, started by the entry-level template, looks like this:
<xsl:template match="dbs:Table" mode="BuildClasses"> <xsl:call-template name="FileOpen"> <xsl:with-param name="imports" select="'KADGEN, System.Data'" /> </xsl:call-template> Public Class <xsl:value-of select="@SingularName"/>Collection Inherits CollectionBase <xsl:call-template name="CollectionConstructors" /> <xsl:call-template name="PublicAndFriend" /> End Class
Public Class <xsl:value-of select="@SingularName" /> Inherits RowBase <xsl:call-template name="ClassLevelDeclarations" /> <xsl:call-template name="Constructors" /> <xsl:call-template name="BaseClassImplementation" /> <xsl: call-template name ="FieldAccessProperties" /> End Class </xsl:template>
The structure of the output file emerges in the high-level processing template, which outputs a header and two classes—a collection class and a row class. Although these classes are more sophisticated than the example in Chapter 1, they’re still a simplified class to focus on the template structure. Chapter 8 offers a complete middle-tier template.
The first thing output to the file is a standard block defined in the FileOpen template. Although I’m not including the contents of this template, its name gives a good idea of what it accomplishes—outputting the header comments, option statements, imports, and so on that appear at the top of the output file. Within the code of this file, the collection class inherits from the CollectionBase class and contains the output of two named templates. The row class inherits from a different base class and calls four named templates.
Building the XSLT Template on Your Own
You can build this template yourself by copying the entire target file (Orders.vb) into your new template. Replace the header with a call to a template named FileOpen that already exists in Support.xslt. If you want to build Support.xslt as well, copy the header into a template in your shell Support.xslt. You can clean that up later. Then place an xsl:call-template directive for each region in the high-level processing template. Give these templates names that match your region name, with spaces removed (you can either use capitalization, as I did, or use underscores). This is the structure of your high-level processing template. You hardly had to think to do it, and any two people within your workgroup would’ve created identical files.
Now create a named xsl:template for each xsl:call-template and copy the contents of the region into the corresponding template. Now you’re ready to use xsl:value-of to replace values with ones retrieved from the XML input file and use xsl:if, xsl:choose, xsl:for-each, and xsl:apply-templates to provide logic to your templates.
Regions are important when you’re debugging XSLT and brute-force code templates. The compiler will find a problem in your output (or you’ll determine where a change needs to be made), and you have to trace this back into your template file. By far the easiest way to do this is using regions to organize both your source code and templates.
Once you’ve got named templates corresponding to each region in your output code, scan the high-level processing template for any values that wouldn’t be the same for all output files. These are the items you need to replace using the xsl:value-of directive. In this case, there are two, both part of the class names. The original lines are as follows:
Public Class OrderCollection ... Public Class Order
You can replace whole words or any part of words. Whether a name is singular or plural needs special attention. The metadata provides attributes for Name, OriginalName, SingularName, and PluralName. Although that may seem like a lot of names, it’s really the only way to ensure you’re getting the right name at the right location. (Table 2-2 shows the different names available for each table.) In this case, you want the singular name in both places:
Public Class <xsl:value-of select="@SingularName"/>Collection ... Public Class <xsl:value-of select="@SingularName" / >
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
Named templates can contain simple or complex output. To create the first named template, create a newxsl:template within your stylesheet and copy the code for the first region into it. It’ll look like this:
<xsl:template name="CollectionConstructors"> #Region "Constructors" Protected Sub New() MyBase.New(" OrderCollection") End Sub #End Region </xsl:template>
Look for anything that wouldn’t be valid in another output file—all the stuff that changes. The only changing value here is the text OrderCollection. You can replace this with an xsl:value-of directive supplying the SingularName attribute. The resulting template is as follows:
<xsl:template name="CollectionConstructors"> #Region "Constructors" Protected Sub New() MyBase.New("<xsl:value-of select="@SingularName" />Collection") End Sub #End Region </xsl:template>
Named templates aren’t always this simple to create. When you copy in the Public and Friend… region of the Orders.vb file, you have this:
<xsl:template name="PublicAndFriend"> #Region "Public and Friend Properties, Methods and Events" Public Overloads Sub Fill( _ ByVal OrderID As System.Int32, _ ByVal UserID As Int32) OrderDataAccessor.Fill(Me, OrderID, UserID) End Sub #End Region </xsl:template>
You often need to know the intent of the various aspects of your sample file. In this case, OrderID is the primary key of the table, and the application passes a UserID to all Fill methods. This is complicated by the fact that Northwind doesn’t contain a primary key for all tables.
NOTE: I didn’t see the problem with missing primary keys in my crystal ball. I created the template without worrying about this and found compiler errors when there weren’t any primary keys.
TIP: Code generation is easier if you use a single-column primary key on all tables.
Variables can simplify processing, but XSLT variables are weird critters. You assign a value when you create them, and then you can never change this value. That doesn’t sound like a variable in any language I’ve used, but it’s consistent with some basic principles of XSLT (discussed further in Appendix A). This PublicAndFriend template uses variables of two different types. The first is a string that contains the name of the primary key. An XPath statement retrieves the Name attribute of the TableConstraint element’s PrimaryKey child element of the current node (the table). The context for this XSLT is the table node:
<xsl:template name="PublicAndFriend"> #Region "Public and Friend Properties, Methods and Events"<xsl:text/> <xsl:variable name="primarykeyname" select="dbs:TableConstraints/dbs:PrimaryKey/dbs:PKField/@Name" />
To create a corresponding method declaration, you need the type, as well as the name, of the primary key. To find this, you can use the name of the primary key, now held in the $primarykeyname variable, to retrieve the matching node within the column definitions (see the XML metadata in Listing 3-1). This code assigns a single node to the $primarykey variable. This node is the TableColumn node corresponding to the primary key (context is the table node):
NOTE: This template assumes you’ll use single-column keys and ignores any additional keys. You’ll learn later how to deal with complex primary keys.
TIP: Use the column node when you need any details on the column, such as its type.
The table may have no primary key, and in that case, the class needs a Fill method with only the UserID parameter. The directives between xsl:if and /xsl:if execute only if there are any primary key nodes. When this block executes, output from this block includes the name and type of the primary key:
Public Overloads Sub Fill( _<xsl:if test="$primarykey"> ByVal <xsl:value-of select="$primarykey/@Name"/> As <xsl:text/> <xsl:value-of select="$primarykey/@NETType"/>, _</xsl:if> ByVal UserID As Int32)
NOTE: When testing a variable containing a set of nodes, the test results in true if there are any nodes in the set.
Outputting the call to the Fill method presents similar issues and solutions. The SingularName attribute of the table node makes up part of the name of the data access layer class. The xsl:if directive is again used to include the primary key field name as a parameter only if it exists:
<xsl:value-of select="@SingularName"/>DataAccessor.Fill(Me<xsl:text/> <xsl:if test="$primarykey">, <xsl:value-of select="$primarykey/@Name"/> </xsl:if>, UserID) End Sub #End Region </xsl:template>
Managing Whitespace
This template also provides good examples of whitespace management. Unfortunately, you have to think about whitespace when you write XSLT templates. You might have thought I was careless (or downright dumb) in the layout of the XSLT directives in the previous section, but this layout provides the correct whitespace in the output.
Whitespace that’s output is called significant whitespace. Whitespace is significant, meaning that it’s output, when the whitespace is adjacent to any text that’s output. When any two XSLT directives are adjacent to each other with only whitespace between, the processor assumes that intervening whitespace was for your convenience in reading the template and ignores it. That’s called insignificant whitespace. Put another way, only whitespace that isn’t between XSLT directives is output.
NOTE: The meaning of whitespace is different in the generated VB .NET and C# code. In VB .NET, the type of whitespace often has meaning, particularly the Carriage Return/Line Feed (CRLF) character at the end of each line, the spaces before line continuation characters, and so on. C# has few places where the type of whitespace has semantic meaning for the compiler. However, programmers reading the code will appreciate your efforts at whitespace management in either language.
Because of whitespace behavior, if you write a method declaration such as this:
Public Overloads Sub Fill( _ <xsl:if test="$primarykey"> ByVal <xsl:value-of select="$primarykey/@Name"/> As System.Int32, </xsl:if>
then two new lines (two CRLF pairs) will be output, one after the underscore and the other before the ByVal. The result would be this:
Public Overloads Sub Fill( _
ByVal OrdersID" As System.Int32
with the blank line causing a VB .NET compiler error. Take another look at the template lines if you don’t see why this problem would occur.
Sometimes you want to break a line in the template at a location illegal in the output, such as this example. I could just toss in another line-continuation underscore, but that would output ugly code, and I want output to be easy to read, without artifacts of code generation. In the earlier example, I swallowed my desire to put the xsl:if test on a separate line. Alternatively, I could’ve tossed in an empty xsl:text directive, which is simply a meaningless XSLT tag in this context. The xsl:text directive can appear at either the end or beginning of the next line, depending on which is needed to wrap the offending whitespace between two XSLT directives. In this case, you can put it either after the underscore or before the ByVal, but not both:
Public Overloads Sub Fill( _ <xsl:text/> <xsl:if test="$primarykey"> ByVal <xsl:value-of select="$primarykey/@Name"/> As System.Int32</xsl:if>
TIP: Whether you move directives onto the ends of lines such as this approach or use xsl:text at the start of lines is sometimes matter of style. But in other cases, conditional and looping constructs dictate where you place the xsl:text element.
The XSLT processor generally ignores whitespace within directives. I have the extra demands of making text readable on this page because you can’t scroll the page if the text gets too wide. This results in extra xsl:text elements in the book’s code, but it also gives me a chance to clarify where you can wrap within directives. You can generally split anywhere in a directive other than in a quoted string or within an element or attribute name, such as:
Sometimes you’ll need to output whitespace between directives. You might need to add tabs, spaces, or new line characters. These need to be escaped character sequences placed within an xsl:text directive. Appendix A shows examples of adding whitespace using escape characters. Support.xslt also contains a template that outputs new line characters.
XSLT whitespace is a pain. But as you become familiar with it, you’ll find it workable, even if your XSLT templates will never have the grace of alignment you expect from your generated files.
TIP: Solve whitespace issues so your generated code is logically aligned. This will save lots of problems in reading your code later.
Using Looping, Conditional Variables, and Other Fun Stuff
The ClassLevelDeclaration named template illustrates a couple of new things. The code target starts with this:
#Region "Class Level Declarations" Protected mCollection As OrderCollection Private Shared mNextPrimaryKey As Int32 = -1
Private mOrderID As System.Int32
This goes on with a full list of fields that correspond to the columns in the table.
The first three lines of the template that outputs this code uses techniques you’ve already seen. You can output the list of the fields using an xsl:for-each directive. The xsl:for-each directive loops through each node in the specified set and outputs the class-level declarations:
<xsl:template name="ClassLevelDeclarations"> #Region "Class Level Declarations" Protected mCollection As <xsl:value-of select='@SingularName'/>Collection Private Shared mNextPrimaryKey As Int32 = -1 <xsl:for-each select="dbs:TableColumns/*"> <xsl:if test="string-length(@NETType)>0"> Private m<xsl:value-of select="@Name" /> <xsl:text/> As <xsl:value-of select="@NETType" /> </xsl:if> </xsl:for-each> #End Region </xsl:template>
NOTE: In every case where an xsl:for-each directive works, an xsl:apply-template with a mode also works, and vice versa. In your favorite programming language, you gained confidence in your proficiency at separating parts into subroutines. Similarly, you’ll quickly become confident in determining whether to use an xsl:for-each or an xsl:apply-templates. If you avoid xsl:apply-templates, you’ll build difficult-to-read XSLT templates with deeply nested loops. But, too many templates containing only one or two lines can also be difficult to read. As a rough guideline, if the match template would contain fewer than three lines, or the calling template would be reduced to fewer than about five lines, I use xsl:for-each. Otherwise, I use xsl:apply-templates.
The xsl:if handles the case where the metadata doesn’t include a NETType attribute. The NETType attribute is derived from the SQL type during metadata extraction, and the extraction process doesn’t yet support some SQL types such as image and varbinary types that exist in the Northwind database. They don’t support them because I don’t anticipate displaying employee pictures. Because the metadata contains empty strings for these unsupported types, the generated source code would result in .NET compiler errors. Again, my crystal ball didn’t tell me this. I ran the template, found complier errors, and tracked them back to this problem.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
Match templates provide a mechanism to segment your XSLT stylesheet and simultaneously loop through a set of nodes. This is very handy. Using match templates instead of xsl:for-each directives will not only show the world that you’re a competent XSLT jock but also will result in cleaner, more readable stylesheets.
The Orders.vb file contains a columnNameColumnInfo method and a columnName property for each column in the table. This function and property are shown near the end of Listing 3-2. The columnNameColumnInfo method returns an object containing information about the column. This type of information is frequently helpful to UI programmers using the data container class. The columnName property wraps the actual data.
Building a match template is much like building a named template. The key differences are that named templates run only once each time they’re called and don’t change the context within the XML metadata input file. Match templates, on the other hand, may run zero, one, or many times each time they’re called. Match templates execute in the context of the selected node—the current member of the node list. This is similar to the xsl:for-each directive’s behavior.
To build this template, copy the method and property blocks for any one of the columns from the target file into the template and identify the changeable items. These changeable items are retrieved from column’s information in the metadata file, which is the current context when the output file is generated.
This template handles the problem with missing .NET types a little differently. Rather than skipping the code, it outputs explanatory comments within an xsl:choose directive if the NETType attribute is missing or empty:
<xsl:template match="dbs:TableColumn" mode="ColumnMethods" > <xsl:choose> <xsl:when test="string-length(@NETType)=0"> ' TODO: Column <xsl:value-of select="@Name"/> is not included because it uses
' a SQLType (<xsl:value-of select="@SQLType"/>) that is not yet supported </xsl:when> <xsl:otherwise>
Public Function <xsl:value-of select="@Name"/> <xsl:text/>ColumnInfo As ColumnInfo Dim columnInfo As New ColumnInfo columnInfo.FieldName = "<xsl:value-of select="@Name"/>" columnInfo.FieldType = GetType(</xsl:text><xsl:value-of select="@NETType"/>) columnInfo.SQLType = "<xsl:value-of select="@SQLType"/>" columnInfo.Caption = "<xsl:value-of select="@Caption"/>" columnInfo.Desc = "<xsl:value-of select="@Desc"/>" Return columnInfo
End Function
Public Property <xsl:value-of select="@Name"/> As <xsl:text/> <xsl:value-of select="@NETType"/><xsl:call-template name="NewLine"/> Get Return m<xsl:value-of select="@Name"/><xsl:call-template name="NewLine"/> End Get Set(ByVal Value As <xsl:value-of select="@NETType"/>) m<xsl:value-of select="@Name"/> = Value End Set End Property </xsl:otherwise> </xsl:choose> </xsl:template>
NOTE: The XPath expressions used by the xsl:value-of directive allows you to access any piece of information in the XML meta-data file, even if it’s nowhere close to the node you’re currently processing. If you manage to find a problem you can’t solve with XPath, Microsoft’s implementation of XSLT supports both script and calling back to .NET objects. The sky is the limit of what you can do with XSLT—but it won’t always be easy.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
Supporting stylesheets allow you to isolate generic templates that you can reuse in different XSLT stylesheets. For example, the SimpleDataContainer.xslt stylesheet uses the NewLine template in the Support.xslt file. Using this template doesn’t save much typing, but it keeps you (and later programmers) from having to remember the hexadecimal values of the carriage-return and line-feed characters:
As another example, you’ll frequently need to control commas in lists. This is an example of when you might need to output this:
{0, 1, 2}
You can control the comma at the beginning or the end of the list, but the CommaIfNotFirst template is theoretically more efficient than CommaIfNotLast:
<xsl:template name="CommaIfNotFirst">
<xsl:if test="position()!=1">, </xsl:if>
</xsl:template>
<xsl:template name="CommaIfNotLast">
<xsl:if test="position()!=last()">, </xsl:if>
</xsl:template>
NOTE: XPath has a number of useful functions. If you have trouble finding references to them in the .NET Help, look up the “xslt, reference” section in the index. This will get you into the fairly good XSLT documentation. Appendix A also discusses XPath functions.
The SimpleDataContainer.xslt stylesheet uses the FileOpen template in Support.xslt to supply a consistent header at the top of the output files, which you can adjust to your liking. When you make a change, the change will appear in all stylesheets importing or including Support.xslt. This template needs to run as part of a stylesheet that has the fileName stylesheet parameter or variable. If not run with this available, you’ll get XSLT errors. If you aren’t sure this parameter will be available, explicitly pass it as a template parameter because not passing a template parameter isn’t considered an error. Accessing a variable that doesn’t exist, however, is an XSLT error.
The FileOpen template outputs standard Option Strict, Option Explicit, and Imports System statements into your output file’s header. The imports parameter passes additional imports as a comma-delimited string. The named template RecursiveImports breaks this string down and outputs the Imports statement. The template then outputs a comment block containing the filename:
<xsl:template name="FileOpen"> <xsl:param name="imports" /> Option Strict On Option Explicit On Imports System <xsl:call-template name="RecursiveImports"> <xsl:with-param name="imports" select="normalize-space($imports)"/> </xsl:call-template> #Region "Description" ' <xsl:call-template name="StripPath"> <xsl:with-param name="fname" select="$fileName" /> </xsl:call-template> #End Region </xsl:template>
TIP: You could include the generation date and time, but that would cause source control to always see the file as new and unnecessarily mark the file as changed. Because of this complication, I suggest you don’t include the date and time in the header.
Using Recursive Templates
How do you break down a comma-delimited string if you can’t reassign the values of variables? You use a recursive template that creates a new variable every time you call it. This is an advanced aspect of XSLT that can be handy at times. String manipulation such as the previous two samples is a convenient place to use recursive templates. The RecursiveImports template outputs an Import statement for each namespace in a comma-delimited list:
TIP: I included this example so you have a reference of when you need recursion. Don’t worry if you get a little lost in it—you don’t need to use recursion often.
The imports parameter contains the comma-delimited string. The template creates a new variable named remaining that contains everything after the first comma. An xsl:choose directive tests whether the remaining variable contains anything. The remaining variable would be empty if the imports parameter ended with a comma or didn’t contain a comma. If the remaining variable isn’t zero length, the template outputs an Imports statement with the contents before the comma. The normalize-space XPath function is basically the same as the .NET String class’s Trim method. After outputting the Imports statement, the template calls itself, passing the contents of the $remaining variable as the imports parameter. This effectively calls the RecursiveImports for every item in the comma-delimited string, with the imports parameter chopped off at the first comma each time. The last time it’s called, the imports parameter contains no comma, the xsl:otherwise directive is processed, and the template simply outputs the Imports statement.
The trick to creating recursive templates (or using recursion in any language) is to provide a clear end to the recursion. In this case, you chop off string elements until there’s no comma left. If you fail to provide this clear end point, the recursion is endless, or unbounded.
TIP: When processing XSLT in .NET, if you encounter endless recursion, your application will appear to freeze up (the action will take significantly too long). If the debugger is running, you can break using Ctrl+Break and end the process.
You can use the same approach to strip a path off a full filename, resulting in just the filename. In the file header, the name of the file is useful. The directory where you generate it may not be its permanent location, so it isn’t included. The StripPath template is otherwise similar to the RecursiveImports template:
NOTE: \ is the escape sequence for the backslash character. You have to escape the backslash in at least some cases.
Capping Off XSLT Code Generation
The previous sections took XSLT code generation out of the theoretical to show how you can use it in the real world with advanced XSLT techniques. To make XSLT code generation easy, you’ll want to use the following:
A single entry-level template accessing the root
A single high-level processing template called in the context you’re processing
A named template (xsl:call-template) for each region in the output after you organize your sample code into regions
Additional nested named and match templates (xsl:apply-templates) as regions become longer and more complex
xsl:value-of for inserting values
xsl:if when a conditional block has only one option and xsl:choose when there are multiple conditional blocks
Match templates for any looping elements unless the output is nearly trivial, in which case you should use an xsl:for-each
A separate supporting stylesheet to contain reusable utility templates
You’ll see more samples of XSLT code generation throughout the book. I use it for examples because it’s more concise than the other code generation mechanisms.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
Brute-force code generation is the simplest way to output code. It just spits out code to an open stream or writer. Because there are many variations on streams, there are different approaches to doing brute-force code generation. The approach I’ll demonstrate uses the same metadata input file used for XSLT code generation. Instead of transforming it with XSLT, the metadata is loaded into the XML Document Object Model (DOM) where you can manipulate it for code generation.
TIP: Brute-force generation doesn’t impose a high degree of order on your code generation—it’s easy to be sloppy. Moving from manual development to the more abstract approach of code generation can be a big step, so don’t make it more difficult with obscure or inconsistent techniques.
NOTE: I’ll outline one approach in detail here. If you’re using a variation of brute force, use this section for comparison.
All three of the code generation mechanisms rely on streams when run through the code generation harness. This allows hashing tools to access the output stream and insert a hash marker so you can later tell if the file has been edited manually. Other than hashing, XSLT and CodeDOM code generation don’t use readers and writers. They use other .NET tools for their processing. Brute-force code generation outputs code directly to a stream, so you’ll have to be familiar with writers. Readers aren’t important in this context because you’re writing to a stream, not reading from one. See the “Understanding File Streams” sidebar for more information on working with streams.
Understanding File Streams
For an understanding of streams, maybe a good place to start is Help for the Stream class. The description of the class is that it “provides a generic view of a sequence of bytes.” (see footnote 3)
That reminds me of a joke: A pilot lost in a plane low on fuel with limited visibility shouts to a guy on the roof of a building to ask, “Where am I?” The guy on the building says, “You’re in a plane.” The pilot flies around and lands at the airport. His terrified passenger asks how he found the airport. “That answer was absolutely correct and not terribly helpful, so I figured that was Redmond.” (see footnote 4)
Footnote 3. .NET 2003 Help: Description of the Stream class.
Footnote 4. I like the joke, but the Microsoft documentation actually does a decent job of covering many parts of the complex .NET Framework.
Anyway, further exploration of streams shows this description of streams to be absolutely correct, even if it doesn’t seem terribly helpful at first glance. Data— all the data you work with—is ultimately a bunch of bytes. These bytes might be in a file, an explicit memory buffer, a string, your console, and so on. The actual storage of the bytes is called the backing store, and regardless of which backing store you use, it’s still just a bunch of bytes.
Streams are pipes into the backing store. Streams are named in terms of the backing store they lead to, such as FileStream or MemoryStream. Streams are pipes, so they don’t themselves hold anything—although in some cases, such as the MemoryStream, the backing store is somewhat hidden behind the stream. Exactly what you stick into a stream depends on the encoding it supports. Like real-world pipes, a stream can generally transfer information in either direction.
.NET also provides fittings for the end of the pipes called readers and writers. These fittings are specific to either reading or writing, like a spigot with a backflow fitting. Readers and writers don’t depend on where the data is going—in other words, they don’t care what’s on the other end of the pipe. Readers and writers provide special features you can use to get data pushed into or pulled out of the stream. They provide specialized syntax for the type of data you’re converting to or from bytes. This can be as simple as Unicode encoding or as complex as an interface that understands HTML (such as the System.Web.UI.HTMLTextWriter). For XSLT and CodeDOM generation, the stream is just returned from the process and passed on to the hash tools.
NOTE: There’s more on using hashing code output in Chapter 5.
The hash tools create a reader and a writer to work with the stream. Because it’s inserting something into the stream, it reads from one stream and outputs to another. This is analogous to adding dye to a stream. You’d run water from the tap into some sort of a mixing bucket, and then you’d pour it into another pipe leading to what you were doing with the dyed liquid. You wouldn’t try to stuff the dyed liquid back into the pipe from which the clear liquid came. The full ApplyHash routine is as follows:
Public Shared Function ApplyHash( _ ByVal inStream As IO.Stream, _ ByVal commentText As String, _ ByVal commentStart As String, _ ByVal commentEnd As String) _ As IO.Stream Dim s As String Dim reader As New IO.StreamReader(inStream) Dim writer As New IO.StreamWriter(New IO.MemoryStream) Dim hashstring As String Dim fullHeaderMarker As String = commentStart & HeaderMarker & commentEnd inStream.Seek(0, IO.SeekOrigin.Begin) s = StripHeader(reader.ReadToEnd, fullHeaderMarker) hashstring = CreateHash(s) writer.WriteLine(fullHeaderMarker) writer.WriteLine(commentStart & commentEnd) writer.WriteLine(commentStart & commentText & commentEnd) writer.WriteLine(commentStart & commentEnd) writer.WriteLine(commentStart & HashMarker & hashstring & HashMarker & _ commentEnd) writer.WriteLine(fullHeaderMarker) writer.Write(s) writer.Flush() writer.BaseStream.Seek(0, IO.SeekOrigin.Begin) Return writer.BaseStream End Function
This function also shows how you can access the underlying stream for a reader or writer and how you can use the Seek method to reset the position within the file.
Initially streams might seem nonintuitive, but they provide a granular approach to byte processing. In the ApplyHash routine, it makes no difference what type of stream the data comes from. Another version of this routine could take the output stream as a parameter, making the task of applying the hash independent of the backing store for the data coming in or passed back out.
Using the IndentTextWriter
Although there are numerous writers within .NET that you could use for brute-force code generation, the best one is snuggled inside the CodeDOM.Compiler namespace. The IndentTextWriter manages indenting spaces for you. You can easily increase and decrease the indent of the output code using the Indent property:
Dim stream As New IO.MemoryStream Dim inwriter As New CodeDom.Compiler.IndentedTextWriter( _ New IO.StreamWriter(stream)) inwriter.Indent += 1
You can also skip indenting with the WriteLineNoTabs method:
TIP: You’ll be outputting a lot of double quotes; creating a constant for them with a short name such as DQ will save some typing.
NOTE: When you concatenate to strings as part of your code generation, consider using a StringBuilder rather than just concatenating pieces onto a string. .NET strings have a lot of cool qualities, but they require copying on every assignment. If you’re doing more than about six assignments that concatenate (multiple concatenations within a single assignment have no performance hit), you’ll gain performance if you append using a string builder. This performance advantage becomes significant with more than a few dozen assignments.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
Creating a class using brute-force code generation has similarities with the XSLT approach to code generation. In both cases, you can use regions to organize both your output file and your template, and you can supply a different method to output each region. You start with an entry-level method that’s the rough equivalent to the XSLT entry-level template.
Creating the Entry-Level Method
The entry-level method is specified in the Process attribute of the directive in the harness script file. The harness calls this method and expects it to generate a stream containing the output code. The entry-level template has to be a shared (C# static) method. In this case, the entry-level method is GetStream.
The GetStream method first creates a MemoryStream for output, then an IndentedTextWriter, and some other variables:
Public Shared Function GetStream( _ ByVal Name As String, _ ByVal fileName As String, _ ByVal genDateTime As String, _ ByVal nodeSelect As Xml.XmlNode) _ As IO.Stream Dim stream As New IO.MemoryStream Dim inwriter As New CodeDom.Compiler.IndentedTextWriter( _ New IO.StreamWriter(stream)) Dim nodeList As Xml.XmlNodeList Dim nodeColumn As Xml.XmlNode
GetAttributeOrEmpty is a generic utility method that returns the attribute specified or an empty string if the attribute isn’t present in the XML DOM element. You’ll need a namespace manager when you reference a namespace (via a prefix) in the input XML. The namespace would generally be referenced with a prefix in arguments passed to the SelectSingleNode or SelectNode methods of the XML DOM. A utility method in the Tools class creates the namespace manager:
Dim singularName As String = Utility.Tools.GetAttributeOrEmpty(nodeSelect, _ "SingularName")
Dim nsmgr As Xml.XmlNamespaceManager = _ Utility.Tools.BuildNameSpaceManager( _ nodeSelect.OwnerDocument, "dbs", False)
CAUTION: Namespaces can be hard to handle, especially in .NET, which is unfriendly to the default namespace (as discussed in Appendix A).
The GetStream method then calls the FileOpen method contained in a separate support file. Placing this method in a central location allows reuse by other brute-force templates. Outputting an empty string using the WriteLine method produces a blank line:
Another supporting method, WriteLineAndIndent, outputs a line of text and then indents the writer. Changeable information is inserted either as a variable or as a direct lookup of information in the metadata XML. The following outputs the class declaration using a variable (singularName is declared previously):
Similar to the XSLT code generation, it’s a lot easier to debug if you call a method for each region. The collection class contains two regions so the high-level method calls two other methods to output the core of the class:
The WriteLineAndOutdent method is similar to the WriteLineAndIndent method, and it decreases the indent by one and then outputs the specified text.
Outputting the row class is much like outputting the collection class. The next code uses a For Each construct to loop through the columns and calls a method for each column:
inwriter.WriteLine("") inwriter.WriteLine("") Support.WriteLineAndIndent(inwriter, "Public Class " & singularName) inwriter.WriteLine("Inherits RowBase") ClassLevelDeclarations(inwriter, nsmgr, nodeSelect) Constructors(inwriter, nsmgr, nodeSelect) BaseClassImplementation(inwriter, nsmgr, nodeSelect) Support.MakeRegion(inwriter, "Field access properties") nodeList = nodeSelect.SelectNodes("dbs:TableColumns/*", nsmgr) For Each nodeColumn In nodeList ColumnMethods(inwriter, nsmgr, nodeColumn) Next Support.EndRegion(inwriter) Support.WriteLineAndIndent(inwriter, "End Class")
Flushing the stream after you’ve finished output ensures that the stream has emptied any buffers and all information is safely in the stream before the writer goes out of scope. Not all writers need to be flushed, but it’s a great habit to be in because it’s essential with many of them:
inwriter.Flush() Return stream End Function
Looking at a Sample Region
I’ll skip over the Constructors region method because it doesn’t present anything new and instead will walk through one of the more complex region methods. Just like the XSLT templates, the PublicAndFriend region outputs the primary key only if it exists. Passing the namespace manager parameter means you don’t have to re-create it if methods need it for any XML processing:
Private Shared Sub PublicAndFriend( _ ByVal inWriter As CodeDom.Compiler.IndentedTextWriter, _ ByVal nsmgr As Xml.XmlNamespaceManager, _ ByVal node As Xml.XmlNode) Support.MakeRegion(inWriter, _ "Public and Friend Properties, Methods and Events")
The SelectSingleNode statement retrieves the primary key element of the TableConstraints element that’s stored in the nodeTemp variable. If this node is found, the value of its Name attribute is assigned to the primaryKeyName variable:
Dim nodeTemp As Xml.XmlNode = node.SelectSingleNode( _ "dbs:TableConstraints/dbs:PrimaryKey/dbs:PKField", nsmgr) Dim primaryKeyName As String = "" Dim primaryKey As Xml.XmlNode If Not nodeTemp Is Nothing Then primaryKeyName = Utility.Tools.GetAttributeOrEmpty(nodeTemp, "Name") End If
This XPath expression in the next SelectSingleNode uses the primaryKeyName to retrieve the TableColumn node with a Name attribute matching the primaryKeyName:
Once you’ve got the primaryKey node, you can use it in a simple conditional to output a parameter for the primary key value only if there’s a primary key. You can use the Write method to output a partial line without a CRLF:
If Not primaryKey Is Nothing Then inWriter.Write("ByVal " & primaryKeyName & " As ") inWriter.WriteLine(Utility.Tools.GetAttributeOrEmpty(primaryKey, _ "NETType") & ", _") End If
The remainder of the function reuses the same techniques you’ve already seen:
inWriter.WriteLine("ByVal UserID As Int32)") inWriter.Indent -= 4 inWriter.WriteLine("ByVal UserID As Int32)") inWriter.Write(Utility.Tools.GetAttributeOrEmpty(node, "SingularName") & _ "DataAccessor.Fill(Me") If Not primaryKey Is Nothing Then inWriter.Write(", " & primaryKeyName) End If inWriter.WriteLine(", UserID)") Support.WriteLineAndOutdent(inWriter, "End Sub") Support.EndRegion(inWriter) End Sub
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
The FileOpen method outputs the same file header as the XSLT code generation template. You can see additional supporting methods and the template code that outputs the remainder of the sample in the download from the Web site. The fileName and genDateTime are explicitly passed as parameters. The Split method of the framework splits the import parameter at commas to output separate Imports statements and uses a framework method to retrieve the filename from the full path:
Public Shared Sub FileOpen( _ ByVal inWriter As CodeDom.Compiler.IndentedTextWriter, _ ByVal import As String, _ ByVal fileName As String, _ ByVal genDateTime As String) inWriter.WriteLine("Option Strict On") inWriter.WriteLine("Option Explicit On") inWriter.WriteLine("") inWriter.WriteLine("Imports System") For Each s As String In import.Split(","c) inWriter.WriteLine("Imports " & s) Next MakeRegion(inWriter, "Description") inWriter.WriteLine("'") inWriter.WriteLine("'" & IO.Path.GetFileName(fileName)) inWriter.WriteLine("' Last Genned on Date: " & genDateTime) inWriter.WriteLine("'") inWriter.WriteLine("'") EndRegion(inWriter) End Sub
Winding Up Brute-Force Code Generation
That’s it for brute-force code generation. You can extend these ideas to build any code you want. The key to success is breaking the process down into discrete steps as follows:
Break code outputting away from metadata extraction.
Split outputting into individual template classes that each output a single file pattern.
Within each template class, use an entry-level method.
Use a different method for each region.
Use additional methods as regions become longer and more complex.
Use supporting methods for reuse wherever possible.
Exploring Details of the CodeDOM
CodeDOM generation is interesting for a number of reasons. It’s a full abstraction of the nature of programming languages. In other words, it breaks code down into the parts of byte speech—the grammar of your programs. When you write code to this abstraction, a .NET or third-party provider (see footnote 5) can build output code in any target language. It’s also cool that Microsoft provided something developed for its own use. Visual Studio uses CodeDOM generation for the strongly typed DataSet, the class created for each ASPX file, the CodeDomSerilizer, and a few other purposes. If you look at the problems that arise in outputting code for entirely different languages, it’s an amazing piece of software in itself.
NOTE: The CodeDOMSerializer also uses the CodeDOM. So, if you’re working with CodeDOMSerializer, everything in this section applies, and you may find this tutorial helpful. I’m not discussing details of the CodeDOMSerializer because it isn’t related to application-wide code generation. Instead, it allows you to control the code produced by Visual Studio when it creates the region labeled Windows Form Designer generated region in your form. If you’re interested in using this, you can find articles on the Web that discuss how to tie it to your class using attributes and other unique aspects of convincing Visual Studio to generate your code correctly. Search for CodeDOMSerializer at MSDN or on Google to find more information.
The CodeDOM has the narrow purpose of outputting code in any language that has a language provider, and Microsoft supplies providers for C#, VB .NET, and J#. (see footnote 6) Unfortunately, the effort you have to go through to achieve this flexibility— the real cost to you—is high. For almost all applications, settle on a language and use one of the other mechanisms for code generation.
Footnote 5. There aren’t any third-party providers yet, but the CodeDOM is designed for new languages to be incorporated into it.
Footnote 6. If you’re generating code for J#, please read the “CodeDOM, For Java Language” section in the .NET Help. Among a few other details, you have to reference the language-specific compiler.
NOTE: You might not want to flip past this section just yet. If you’ve never chopped up code to see it in the perspective of its atomic grammar units—the adverbs, nouns, and adjectives that make up the grammar of common code—it’s a useful exercise to explore the CodeDOM. You’ll get past differences such as VB .NET’s declaring variables with the name followed by the type versus C# declaring the type followed by the name (Dim iVar As System.Int32 versus System.Int32 iVar). You’ll also discover some of the things your language does for you that exceed interpreting syntax because those things don’t work well via the CodeDOM. Universities often require a compiler course for Computer Science (CS) graduates so they understand the underlying programming “parts of speech” and how they’re put together different ways in different languages. Maybe it’ll help you find a deeper Zen connection with your code. On the other hand, there’s a chance that you’ll never use the information in the rest of this chapter.
In this section, you’ll find out just how ugly it can be to output via the CodeDOM. If you’re thinking about using the CodeDOM for code generation, you’ll want to be realistic about how much work it takes to use the CodeDOM and the limitations on what you can accomplish. You’ll spend considerably more than twice as much time writing and maintaining the code, compared to the other code generation mechanisms. You’ll see a payoff only when you need to maintain absolute symmetry between your C# and VB .NET output or when maintaining parallel testing is impractical. If that’s the case, you’ll want to use the CodeDOM, and you’ll find a real shortage of information about it. Because of this void, this is a core tutorial to understanding the underlying technology. This basic understanding will open up the CodeDOM Help to you, which in most cases isn’t half bad once you know what you need. I won’t walk through what it takes to create a full class, but you’ll see enough basics to work through building one if you decide to continue with the CodeDOM.
NOTE: CodeDOM creates plain-vanilla code based on common syntax. Each compiler does a few back flips to make your life as a developer easier. The CodeDOM supports basic syntax, not these extra features. For example, the CodeDOM doesn’t support the Handles clause of VB .NET or its supporting WithEvents. Appendix D covers VB .NET and C# features that you can’t do in the CodeDOM. Much of the missing stuff is specific to one of the compilers and not supported by the other. If you need something other than this plain-vanilla code, you’ll have to extend the CodeDOM, which isn’t for the faint of heart (or you’ll need to use snippets, which are a really nasty construct).
NOTE: CodeDOM is a write-only technology. If you want to read the structure of existing code, you’ll need to parse the code or explore the automation model of Visual Studio .NET. Named the FileCodeModel, the Visual Studio automation model resides in the EnvDTE namespace. For C#, the FileCodeModel is read/write, but it’s read-only for VB .NET (the current trade-off for some cool VB–-specific development environment features).
Like the other code generation approaches, there’s no certainty that the code you output will compile. Although the nature of the CodeDOM means certain types of gross syntax errors aren’t going to happen, the CodeDOM leaves you plenty of room to output code that won’t compile in one or more languages or won’t do what you intended when it compiles.
CAUTION: The CodeDOM will allow you to do dumb things without any complaint. Because of the variety of invalid code it’ll happily generate, test your output by compiling in multiple compilers. It’s not a bad idea to do a code review of a few output files with any of the code generation techniques, but this is especially important with the CodeDOM.
Using an Object Hierarchy
At the root of the object model is the CodeDOM graph. The CodeDOM graph is a tree of objects that describes elements of your code in abstract terms. The graph is contained in an object called a CodeCompileUnit, which describes your code in terms of its elemental parts—expressions, statements, members, and so on. Figure 3-2 shows the tree approach of the CodeDOM graph. Each CodeDOM graph outputs one file, but it can contain multiple classes. Using multiple CodeDOM providers, you can create output for this code in multiple languages.
Figure 3-2. The CodeDOM is a tree of code elements.
NOTE: It’s a little confusing that the terms CodeDOM graph and CodeCompileUnit are so closely related. To be precise, the CodeDOM graph is the abstract hierarchy of code elements, and the CodeCompileUnit is the specific class whose objects contain this hierarchy. But in practice, you can probably treat these terms as being synonymous.
NOTE: The CodeDOM parallels reflection in that they both work with abstractions of your code. But the CodeDOM is primarily a mechanism for creating and outputting code, and reflection is primarily a mechanism for retrieving information about an existing assembly. Reflection is also assembly focused, but the CodeDOM is organized in terms of .NET namespaces—with a collection of namespaces making up the CodeDOM graph.
Figure 3-3 shows key elements of the CodeDOM. Namespaces contain type declarations, types contain members, and so on. I excluded a lot of information to keep this figure simple. The interesting part of the CodeDOM starts with the CodeMemberMethod. The CodeEntryPoint, CodeConstructor, and CodeTypeConstructor derive from CodeMemberMethod. The CodeConstructor is the instance constructor, and the CodeTypeConstructor is the Shared or static constructor for the type. These four methods, along with the closely parallel CodeMemberProperty, contain Statements. At the statement level, you begin to break your code down to a molecular level that may be unfamiliar to you. Looking at a human language analogy, the statement is the sentence, the method is the paragraph, the class is the chapter, and the assembly is the book or volume. Applications can consist of many volumes.
Figure 3-3. An object model of part of the CodeDOM.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
Types in .NET is a general term referring to classes, enums, data types, structures, and interfaces. Type declarations in the CodeDOM graph are represented as objects of the CodeTypeDeclaration class. Use the IsEnum, IsStruct, and IsInterface properties to indicate the category if it isn’t a class (which is the default).
CAUTION: CodeTypeMember has an Attributes property for values of the MemberAttributes enumeration. These values are appropriate for members and aren’t intended for use on types. The CodeTypeDeclaration class also contains a TypeAttributes property for values of the TypeAttributes enumeration that are appropriate for types. Be sure you use only TypeAttributes for the CodeTypeDeclaration to avoid later compiler errors. It may seem odd that the CodeTypeDeclaration has both Attributes and TypeAttriubutes, but within the inheritance hierarchy, CodeTypeDeclaration derives from CodeTypeMember. This inheritance design is essential to let you nest types as members within other types but leads to this strange artifact.
Understanding Members
Types contain members. For example, your .NET code might have a Book type (class or structure) that has a Price property and a SetInventory method as two of its members. When you’re building a CodeDOM graph, you specify the kind of member to create by the class you instantiate. These classes are shown in Figure 3-4, and all derive directly or indirectly from the CodeTypeMember. To keep the figure focused on the overall design, Figure 3-3 doesn’t show that CodeEntryPointMethod, CodeConstructor, and CodeTypeConstructor actually derive from CodeMemberMethod. CodeMemberMethod derives from CodeTypeMember. (See footnote 7)
Footnote 7. All of the objects contained in the CodeCompileUnit ultimately derive from CodeObject.
CAUTION: The CodeDOM is all about outputting syntax. It isn’t about correct code. So, in building the CodeDOM graph, you can reference objects and methods that don’t exist. You have to output your code and let the compiler find out where you messed up. For example, a constructor isn’t valid on an interface, but the CodeDOM won’t complain if you provide one. However, the compiler will raise errors.
Understanding Statements
Members contain statements. The statement in code corresponds to the sentence in human language. This is the core of meaning. And just like human sentences, code statements perform different tasks. (See footnote 8.) The CodeDOM includes many different types of statements that perform different tasks. Looking at Figure 3-4, you’ll see 14 different statement types. There’s a comment statement, five statements to control program flow, three for exception handling, two for attaching and removing event handlers, one for variable declarations, one for assignments, and one for using expressions as statements. All statements reside in methods.
Footnote 8. In human languages, sentences can be commands, questions, informational statements, opinions, quotations, and so on.
NOTE: You often have declarations at the top of .NET classes. These may appear to be statements, but they’re actually fields. Fields are members of the class, and statements can only appear within method bodies.
Understanding Expressions
Statements are made up of expressions. Expressions break the meaning of code down to the next level. The expression is similar to a clause in a human language. I can say “Understanding Expressions,” and I haven’t written a complete sentence, but I’ve placed words together in a way that conveys meaning. Within the CodeDOM, a clause may be the reference to a specific variable, invoke an expression, access language operators such as casts or GetType (typeof in C#), and so on. Figure 3-4 illustrates the range of expression types available.
Figure 3-4. Statements and expressions supported by the CodeDOM.
In human language, a clause might stand alone and be grammatically correct in some contexts. For example, I used “Understanding Expressions” as the main title of this section. The same thing occurs within .NET. An example of an expression that can stand alone is invoking a method that doesn’t return a value—calling a Sub in VB. The expression needs a container, and you can use the CodeExpressionStatement class as a container for these simple expressions.
NOTE: Where the CodeDOM expects an expression, it’ll accept any of the expression types, but the output code may not compile.
Most statements use expressions. The statements that don’t use expressions are gray in Figure 3-4. In addition to their use in statements, expressions often use other expressions. For example, an expression could call a method that took an argument that was itself the return value of another method:
obj.MethodA(obj.MethodB(j))
The code to create this line in the CodeDOM is a bit intimidating, but it illustrates how expressions work within the CodeDOM. A CodeExpressionStatement holds the expression. The expression is a CodeMethodInvokeExpression that invokes MethodA on the variable obj. The single parameter passed is another CodeMethodInvokeExpression that invokes MethodB on the variable obj. The code is as follows:
entry.Statements.Add( _ New CodeExpressionStatement( _ New CodeMethodInvokeExpression( _ New CodeMethodReferenceExpression( _ New CodeVariableReferenceExpression("obj"), _ "MethodA"), _ New CodeMethodInvokeExpression( _ New CodeMethodReferenceExpression( _ New CodeVariableReferenceExpression("obj"), _ "MethodC"), _ New CodeVariableReferenceExpression("j")))))
NOTE: A compiler error will occur when you compile the output code if j or obj aren’t declared and/or initialized.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
To show how to build CodeDOM templates, I’ll start with CodeDOM code that creates literals in your output code. Then I’ll show how to create the structure of the CodeDOM graph and finally how to hook them together with CodeDOM code that creates method and class objects that later create your application code. The code in this section is cumulative. For example, I declare the CodeCompileUnit here and use it throughout later sections of the chapter. The compile unit will contain the CodeDOM graph that holds the rest of the code specifics. You declare the compile unit that contains the CodeDOM graph with this:
Dim compileUnit As New CodeDom.CodeCompileUnit
You enter expressions, such as primitive expressions, using the syntax of the language you’re using when you create the CodeDOM graph—in this case, VB .NET. The code you output depends on the CodeDOM provider you’re using.
Using CodePrimitiveExpressions
The most basic type of expression is a CodePrimitiveExpression. You can’t just say “Hello World” or 42 with the CodeDOM. You have to transform everything, including literals, into an object. A few examples of primitive expressions are as follows:
Dim exp As New CodePrimitiveExpression("Hello World") Dim exp2 As New CodePrimitiveExpression(42) Dim exp3 As New CodePrimitiveExpression(3) Dim s As String = "Sam" Dim exp4 As New CodePrimitiveExpression(s) Dim exp5 As New CodePrimitiveExpression(True) Dim exp6 As New CodePrimitiveExpression(Nothing)
Primitives are literals—a basic unit of programming. It’s impossible to imagine any nontrivial application that doesn’t use literals. Although it might initially look confusing, it’s perfectly legal to use a variable to define the value of a CodePrimitiveExpression, such as the definition of the literal "Sam".
The previous examples that create primitives for True and Nothing may be the most interesting. They illustrate the primitive expression entered in the syntax of the language you’re using to create the CodeDOM graph—in this case, VB .NET. When this value is output in the target language, it may not appear the same. Regardless of the language you use to create the CodeDOM template, if the target language is C#, the code provider outputs true (lowercase) and null. If the target language is VB, the code provider outputs True (mixed case) and Nothing.
Declaring Variables
Another basic type of expression is a local variable declaration. To produce code that will compile, you’ll have to declare local variables before you use them, as either a variable or a parameter. Examples of variable declarations with and without initialization are as follows:
' Declare an integer variable named iSum Dim decl As New CodeVariableDeclarationStatement( _ GetType(System.Int32), "iSum") ' Declare an integer variable named iValue initialized to the expression Exp2 Dim decl2 As New CodeVariableDeclarationStatement( _ "System.Int32", "iValue", exp2) ' Declare an object of the stream class named stream Dim decl3 As New CodeVariableDeclarationStatement( _ GetType(System.IO.Stream), "stream") ' Declare a string variable named fileName with the value Test.txt Dim decl4 As New CodeVariableDeclarationStatement( _ GetType(System.String), "fileName", _ New CodePrimitiveExpression("Test.txt"))
Within the CodeVariableDeclarationStatement, you can declare the type of a variable either using the type (GetType is the VB equivalent of the C# typeof operator) or via a string. If you declare types using a string, you lose IntelliSense and compiler support in catching your typing mistakes. You can avoid using strings when you’re working with framework types, but sometimes you’re referencing unavailable types, such as referencing types in the code you’re generating, and you’ll have to use a string. If a third parameter is supplied, it’s used to initialize the variable.
You create references to variables using the CodeVariableReferenceExpression method shown later.
NOTE: The name CodeVariableReferenceExpression has nothing to do with whether the underlying variable represents a value type or a reference type. It’s just the CodeDOM way of referencing a variable.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
Before using these declarations, take a step back and look at the structure of the method that builds the CodeDOM graph. The compile unit contains namespaces, namespaces contain types (a class in this case), and types contain members (a method in this case) (See footnote 9.):
Dim nSpace As New CodeNamespace("CodeDOMTest") compileUnit.Namespaces.Add(nSpace) ' Create a class to hold code Dim clsStartup As New CodeTypeDeclaration("Startup") nSpace.Types.Add(clsStartup) ' To run as an executable, you'll need a method that's an entry point Dim entry As New CodeEntryPointMethod entry.Name = "Main" clsStartup.Members.Add(entry)
If you don’t want a namespace declared in your output, just use an empty string for the namespace name. You always have to include the namespace object, but if it doesn’t have an explicit name, it isn’t output.
Once you’ve created an object to represent the method that the CodeDOMProvider will later output, you can stuff code into it. Code consists of statements. You create an object representing each method and add to its Statements collection. Statements will be output in the order they appear in the Statements collection (See footnote 10):
Footnote 9. The code in this section is cumulative; compileUnit was declared earlier in the section “Building the CodeDOM Graph.”
Footnote 10. decl1, decl2, decl3, and decl4 were declared earlier in the section “Declaring Variables.”
Outputting Statements
The next code creates a reference to the system console class and outputs one of the primitive expressions defined earlier. This is similar to code in the “Hello World” example in Chapter 1:
Dim rExpConsole As New CodeTypeReferenceExpression( _ GetType(System.Console)) Dim stmt1 As New CodeExpressionStatement( _ New CodeMethodInvokeExpression(rExpConsole, _ "WriteLine", exp)) entry.Statements.Add(stmt1)
Similarly Named Classes Can Be Confusing
Working with types is sometimes confusing. Types are generally, but not always, classes. The word GetType is a keyword and a method, and different .NET Framework classes sometimes have similar names. Some of the classes with the most confusing names support the CodeDOM and the automation model of EnvDTE. To keep these straight, watch the namespace with which you’re working. For example:
CodeTypeReference: Inherits from CodeObject and contains a reference to a type. Use this within other CodeDOM objects to declare variables, parameters, and so on.
CodeTypeReferenceExpression: Inherits from CodeExpression. Use this to call static (Shared) methods.
CodeTypeRef: Part of the Visual Studio .NET automation model and not used in CodeDOM programming.
CodeTypeReference and CodeTypeReferenceExpression are in the System.CodeDOM namespace. CodeTypeRef is in the EnvDTE namespace. Help lists CodeTypeRef and other members of the EnvDTE namespace as CodeTypeRef objects instead of the more common listing as a class entry.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
An assignment consists of two main parts: what you’re assigning to and the value you’re assigning. For example, to create the assignment iSum = 0, you’d use this:
Dim stmt2 As New CodeAssignStatement( _ New CodeVariableReferenceExpression("iSum"), _ New CodePrimitiveExpression(0)) entry.Statements.Add(stmt2)
You’ll often assign the results of a binary expression. A binary expression is an expression with two operands such as addition. It consists of three parts—the left operand, the operator, and the right operand. In the next example, the left operand of the binary expression is a previously created primitive expression, and the right operand is a newly created primitive expression. The code output by this fragment assigns the result of 42 + 23 to the iSum variable:
Dim stmt2 As New CodeAssignStatement( _ New CodeVariableReferenceExpression("iSum"), _ New CodeBinaryOperatorExpression(exp2, _ CodeBinaryOperatorType.Add, _ New CodePrimitiveExpression(23))) entry.Statements.Add(stmt2)
So far, the CodeDOM samples make up about 40 lines of template code. What does the resulting code output look like? It can output code in any language with a CodeDOM provider. Because the generating code is in VB .NET, I’ll show the output in C#. After I removed the CodeDOM’s standard file opening comments, the output is as follows:
namespace CodeDOMTest { using System; public class Startup {
public static void Main() { int iSum; int iValue = 42; System.IO.Stream stream; string fileName = "Test.txt"; System.Console.WriteLine("Hello World"); iSum = 0; iSum = (42 + 23); } } }
NOTE: While the declaration was defined using the system type (System.Int32), it is output using the language specific keyword— int in C# or Integer in VB .NET.
Wow! Not very impressive for 40 lines of code! But that’s life with the CodeDOM. In my experience, templates using the CodeDOM code contain at least three lines of complex code for every line generated.
Accessing Enums
Enums in .NET are created as special classes derived from System.Enum, with each enum value declared as a read-only field. You access enum values by treating them as fields:
Dim enumValue As New CodeFieldReferenceExpression( _ New CodeTypeReferenceExpression( _ GetType(System.IO.FileMode)), _ "Create")
Often you’ll be referencing enums that aren’t part of the framework and aren’t available during generation. One way this can happen is to reference enums in the files you’re outputting. Because the class isn’t yet available, you’ll only be able to use a string to reference the enum. (See footnote 11.)
Footnote 11. The string approach always works, but it doesn’t offer strong typing benefits such as IntelliSense and compiler checks.
NOTE: To create a new enum in a class, create a new member with IsEnum set to True. Use the CodeFieldDeclaration to declare individual enum values.
Creating Objects
Objects are generally created and assigned to a variable, either as part of the variable declaration or as a separate statement in the output file. The next code initializes the previously created stream variable by assigning to it to a new file stream:
Dim stmt3 As New CodeAssignStatement( _ New CodeVariableReferenceExpression("stream"), _ New CodeObjectCreateExpression( _ GetType(System.IO.FileStream), _ New CodeVariableReferenceExpression("fileName"), _ enumValue)) entry.Statements.Add(stmt3)
Declaring Array Variables
You can declare arrays in two different ways. You can create a normal variable with an array type, and you can use an overload of the CodeTypeReference expression. The second parameter to CodeTypeReference is the rank of the array, which allows you to declare arrays with multiple dimensions:
' Declare an array entry.Statements.Add( _ New CodeVariableDeclarationStatement( _ GetType(System.Int32()), "aInts")) ' Shows option for type declaration entry.Statements.Add( _ New CodeVariableDeclarationStatement( _ New CodeTypeReference( _ New CodeTypeReference(GetType(System.Int32)), 1), _ "a2Ints"))
You might find examples using square brackets within strings to declare arrays, such as "System.Int32[]". Although understood by the CodeDOM, this isn’t the preferable way to declare arrays because you don’t get any strong typing benefits.
CAUTION:It’s often convenient to create variables in your CodeDOM code that hold references to expressions. Be cautious with these variables. You’ll often place one reference to an instance in the Statements collection (either directly or through another object) and retain another reference to the same instance. If you make changes, the previous entry in the Statements collection will also reflect those changes. You can create new objects, or you can clone the objects as you put them into the CodeDOM. This is just normal object-oriented behavior, but it’s different from the other two code generation mechanisms that output code as your template runs. The CodeDOM builds a complex object hierarchy and later outputs your code. If you forget this detail, you’ll make a mess of your output.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
You create arrays that can be assigned to previously declared array variables using the CodeArrayCreateExpression:
entry.Statements.Add(New CodeAssignStatement( _ New CodeVariableReferenceExpression("a2Ints"), _ New CodeArrayCreateExpression("System.Int32", 10))) Dim varAInts As New CodeVariableReferenceExpression("aInts") entry.Statements.Add(New CodeAssignStatement( _ varAInts, _ New CodeArrayCreateExpression("System.Int32", _ New CodePrimitiveExpression(0), _ New CodePrimitiveExpression(1), _ ' Additional assignments snipped New CodePrimitiveExpression(9))))
Assigning Array Values
Accessing individual values within the array requires a CodeArrayIndexerExpression. This creates output such as aInts(iValue) in VB .NET or aInts[iValue] in C#. The next example assigns the fourth (see footnote 12) element in the array to the previously declared variable named iValue:
entry.Statements.Add( _ New CodeAssignStatement( _ New CodeVariableReferenceExpression("iValue"), _ New CodeArrayIndexerExpression( _ varAInts, New CodePrimitiveExpression(3))))
Footnote 12. .NET arrays are zero based.
Okay, enough simple stuff. Let’s look at something a little more complex. What would it take to output the following code?
Dim i as Integer aInts(i) = aInts(i + 1)
After declaring the variable i, create an expression variable pointing to i. This and the previously created expression variable pointing to the array save some typing:
entry.Statements.Add( _ New CodeVariableDeclarationStatement( _ "System.Int32", "i", _ New CodePrimitiveExpression(0))) Dim varI As New CodeVariableReferenceExpression("i")
Outputting the target code requires array indexers for both the left and right operands of the assign statement. The left operand uses the variable reference expression pointing to the array and the variable reference expression pointing to i. The right operand also uses the variable reference expression for the array. For the array index, it uses a binary operator expression and the addition operator. The left operand of this binary operator expression is the variable reference expression for i, and the right is a primitive expression for 1:
entry.Statements.Add( _ New CodeAssignStatement( _ New CodeArrayIndexerExpression( _ varAInts, varI), _ New CodeArrayIndexerExpression( _ varAInts, _ New CodeBinaryOperatorExpression( _ varI, CodeBinaryOperatorType.Add, _ New CodePrimitiveExpression(1)))))
This doesn’t output a particularly complex line of code. It outputs a short, rather ordinary line of code, and the CodeDOM template is beginning to feel convoluted. If you understand how this piece of code works, though, you’ll be able to extend it to more complex code statements on your own.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
In addition to the structure of the CodeDOM, a couple of features cut across all the different layers of the CodeDOM. These include UserData, which pass any information to the language-specific compiler, and code snippets, which include literal code in your output.
Using UserData
UserData lets you enter information that the language-specific provider uses to alter details of how it outputs code. This is a great feature. Unfortunately, the current providers support few UserData items, and many that they do support are undocumented.
NOTE: Appendix D shows how to control Option Strict and Option Explicit in VB.NET output using UserData.
UserData appears at all levels of CodeDOM objects (compile unit, namespace, type, member, statement, and expression). In all cases, UserData is an IDictionary object allowing a string key and an object value. Only the language-specific provider that’s looking for a UserData entry will use it. Language compilers ignore any UserData items they don’t know about. If you extend CodeDOM providers, keep UserData in mind as the appropriate mechanism for the template programmer to communicate information to your specific provider.
Using Snippets
Literal snippets are evil. Okay, maybe they aren’t quite evil, but they’re terribly close. Snippets output language-specific code. The code contained in the snippet is output as literals—it isn’t interpreted by the CodeDOM, which is why it’s language specific.
Snippets may seem cool, but a short time after you do anything with literal snippets you’ll smack yourself on the head and do the Homer Simpson “D’oh!” I hope the reason you’re using the CodeDOM is to provide language-specific output from a language-neutral source. Snippets output literal code segments, and literal code segments are inherently language specific. If you’re doing language-specific stuff, you’re breaking language neutrality, so why use the CodeDOM? Snippets are easy to grab when you can’t figure out how to make the CodeDOM do something, and certainly that’s their purpose. But before resorting to them, exhaust other ways to accomplish what you need. You can insert snippets in numerous locations in your code, and there are a handful of different snippet classes to provide the correct type of snippet for different uses, such as CodeSnippetTypeMember, CodeSnippetStatement, and CodeSnippetExpression.
Although it’s ugly, passing a flag that indicates which language you’re currently outputting and inserting different snippets based on the language is possible, but it makes the CodeDOM graph itself language specific, so you have to regenerate the graph for each language.
NOTE: The current version of the code generation harness regenerates the CodeDOM graph although my primary intention was to simplify the harness, not to make it easier to output snippets.
Understanding the Nature of Output
The CodeDOM is about the syntax it outputs and makes no attempt to see if the output is sensible. For example, the CodeDOM recognizes the difference between variables and arguments and provides CodeVariableReferenceExpression and CodeArgumentReferenceExpression. However, both C# and VB .NET happen to use the same syntax for both variables and arguments. Both the variable and the argument class output the same thing, the output will compile, and you won’t see an issue unless someone uses your template to create output in a language that differentiates between the variables and arguments. In this particular case, I can’t think of a language that differentiates how arguments and variables are referenced, so you could argue it doesn’t matter. But that’s really the issue I’m describing—the CodeDOM allows you to be as sloppy as you’d like, and your only feedback is later compiler errors or improperly behaving applications, which may occur down the road and undermine your vision of a single template for all output languages.
NOTE: There are numerous other cases where the CodeDOM and the compiler won’t choke on what you’re outputting, but the CodeDOM classes are improperly used. Although this may seem to make life easier, I think it actually makes it harder to work with the CodeDOM. You don’t know if it’s correct unless you carefully review your code by examining it manually. That’s tedious and error prone. The best you’ll be able to do is get to know all of the classes in the CodeDOM, particularly those in Figures 3-3 and 3-4. In some cases, such as switching the CodeVariableReferenceExpression and the CodeArgumentReferenceExpression, the output is identical and you could probably ignore the issues. In other cases, such as switching the CodeBinaryOperatorExpression and the CodeAssignStatement, the output may work in one language but not all .NET languages.
Using Keywords
Each CodeDOM language provider understands its own keywords and how to mark them when they’re used as variable names. C# escapes them using the at (@) sign, and VB .NET surrounds them with square brackets as in [New]. If you’d prefer to modify keywords, such as adding a leading underscore, you’ll have to do that yourself. Because the CodeDOM language provider handles escaping, you can use any names that match your style guidelines.
Using Parentheses
CodeBinaryOperatorExpressions places parentheses around its output. You can control parentheses with the order you build up these expressions, but you can’t explicitly output parentheses unless you use snippets.
Testing in All Target Languages
It’s absolutely essential that you test in all your target languages. I’ve been unable to create something that executed differently in the two core languages once it compiled in both (with Option Strict On [See footnote 13]) other than the Shadows issue (discusses in Appendix D) and the scope issues on private interfaces. That doesn’t mean it isn’t possible; if you can create something, send it to me because it should be reported as a bug. This consistency says a great deal about how robust and stable the CodeDOM is, but it’s not surprising because Visual Studio uses the CodeDOM in multiple places.
Footnote 13. I suggest you don’t count on C# operator overloading and VB .NET implicit casts (used when you set Option Strict Off) to always produce the same results.
Although I haven’t created code that did different things once it compiled, I can create dozens of different ways to output code that’ll compile in one but not both languages—and that doesn’t even touch on warnings because C# catches unused variables and inaccessible code, and the VB .NET compiler doesn’t. I haven’t used the CodeDOM in J#. It’s essential that you test the compile in all target languages early and often to catch problems and adjust your habits. You also need to test the resulting code. Microsoft used the technology in a relatively narrow usage, and there aren’t many other people out there shaking it down.
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.
I’ve just fluttered over the top of the CodeDOM, focusing on what it means to work with code as an abstract language-independent entity. Although this may seem natural to you if you took a compiler course, it’s a different way of looking at code than day-to-day creation in one or a small number of languages. Appendix D covers additional issues, incompatibilities, and limitations of the CodeDOM.
Most of the tips for efficient use of brute-force generation apply to the CodeDOM. You can’t output regions without snippets, so you may want to replace regions with comments; however, you’ll still want segments of your code output by individual methods. To maintain language neutrality in the CodeDOM, do the following:
Avoid using snippets.
Test your ability to compile the output in all target languages early and often.
Test your component or application in all target languages.
If you can’t follow the first tip, provide alternate code for all target lan guages and re-create the CodeDOM graph for each language.
If nothing else, you can hand this second part of this chapter to your boss if she thinks you should use the CodeDOM. Comparing these three methods of code generation illustrates just how expensive CodeDOM is for developing applications.
Summary
One-click code generation allows anyone to precisely regenerate your code now and in the future. Reproducible one-click generation relies on a script. The script used in this chapter is a set of XML directives.
The keys to success for XSLT code generation are as follows:
A single entry-level template accessing the root.
A single high-level processing template called in the context you’re processing.
A named template for each region in the output after you organize your sample code into regions.
Adding nested named and match templates while regions become longer and more complex.
Using match templates for any looping elements unless the output is nearly trivial.
A separate supporting stylesheet to contain reusable utility templates. The keys to success for brute-force code generation are as follows:
Break metadata extraction away from code outputting.
Split outputting into individual classes that each output a single file pattern.
Within each class, use an entry-level method.
Use a different method for each region.
Use additional methods while regions become longer and more complex.
Use supporting methods for reuse wherever possible.
The CodeDOM abstracts your code sufficiently to generate code in any one of several languages. This abstraction makes working with the CodeDOM extremely complex. Seriously evaluate whether the extra time involved in working with the CodeDOM is justified for your application.
Additional Reading
To find more information about the topics covered, try the following resources:
XSLT: Programmer’s Reference, Second Edition by Michael Kay (Wrox, 2001)
This chapter is from Code Generation in Microsoft .NET by Kathleen Dollard (Apress, 2004, ISBN: 1590591372). Check it out at your favorite bookstore today.