4.10 Using Zip, GZip, Tar, and BZip2 Archives in Your Software with SharpZipLib
Compression isn’t just for smashing up scads of text or binary data into archives; it’s also a critical element for efficient transport of data from one network endpoint to another. SharpZipLib, brought to you by the same folks at ic#code who created the SharpDevelop open source .NET IDE, can help you by compressing any data stream you’re working with.
Work with Zip, GZip, Tar, and BZip2 files via this library written entirely in C#
License type
Modified GPL; usable for commercial closed-source apps, but read the license
Online resources
Forum
Supported Frameworks
.NET 1.1, 2.0
Getting Started
SharpZipLib is available in several different distributions. You can download a compiled assembly complete with scripts for installing in your system’s GAC, or you can grab source code and build the assemblies yourself.
Because SharpZipLib’s source code was created in SharpDevelop, you’ll need that IDE to work with the source’s project files. NAnt build scripts are included with the source, so you have an option for building if you’re not currently using SharpDevelop.
SharpZipLib’s GZip and Zip compression routines use the same basic process: create a stream for the zip type, write some bytes to it, and then close it. An example for GZip’s simplest incarnation is shown in Example 4-4.
using (Stream s = new GZipOutputStream(File.Create(target))) { using (FileStream fs = File.OpenRead(source)) { byte[] buffer = new byte[fs.Length]; fs.Read(buffer, 0, (int) fs.Length); s.Write(buffer, 0, buffer.Length); } } }
Example 4-5 shows the simplicity of extracting a file from a GZip archive.
Example 4-5. Extracting a GZip file
using (Stream input = new GZipInputStream(File.OpenRead(source))) { using (FileStream output = File.Create(Path.GetFileNameWithoutExtension(source))) { int buffSize = 2048; byte[] outBuffer = new byte[2048]; while (true) { buffSize = input.Read(outBuffer, 0, buffSize); if (buffSize > 0) { output.Write(outBuffer, 0, buffSize); } else { break; } } } }
Working with BZip2 compression is even easier. Static methods give quick access to compression and decompression functionality. Compressing a file is a simple matter, as shown in Example 4-6.
Zip compression works much the same; however,ZipOutputStreamobjects can make use of theZipEntryclass.ZipEntryobjects work nicely when you want to pass in a list of files to add to an archive (say, a list of files a user has selected from a menu). Example 4-8 shows how to archive a list of files.
Example 4-8. Zipping multiple files
private void ZipMultipleCompress(string target, string[] fileNames, int compressionLevel) { using (ZipOutputStream zipOutStream = new ZipOutputStream(File.Create(target))) { zipOutStream.SetLevel(compressionLevel);
foreach (string file in fileNames) { using (FileStream inStream = File.OpenRead(file)) { byte[] buffer = new byte[inStream.Length]; inStream.Read(buffer, 0, buffer.Length);
Another handy feature of theZipOutputStreamenables you to add checksums via SharpZipLib’sCrc32class. Create a newCrc32object:
Crc32 crc = new Crc32();
and then make use of it for eachZipEntryitem in your list:
// clear previous CRC, compute new one, add to entry crc.Reset(); crc.Update(buffer); entry.Crc = crc.Value;
There’s also a handyFastZip class that offers quick creation and extraction of Zip files. It has a few restrictions, though:FastZipcan’t use absolute file paths, nor is it appropriate for collections of files—it’s better used for files within one directory tree.
Tar archives are fully supported in SharpZipLib as well. The usage pattern is very similar to the Zip method shown in Example 4-8. You’ll have to deal with creating output streams for the different compression types (BZip2, GZip) yourself, but this is straightforward and is well documented in the distribution’s examples.
Getting Support
SharpZipLib has an active community in forums at the library’s home page, where most issues you run into can be resolved quickly.
SharpZipLib in a Nutshell
SharpZipLib gives developers great flexibility and power in creating archives in standard, well-known formats.
Excel is a ubiquitous tool, used for everything from simple tabular-format databases to complex statistical analyses. In fact, it is so widely used that in some cases it makes perfect sense to be able to save your application’s data out to a workbook so users can easily work with it in Excel.
Carlos Aguilar Mares of Microsoft, formerly on the ASP.NET development team for .NET 2.0, has developed two tools that make it very simple to get your application’s data into Excel using its XML workbook format. ExcelXmlWriter is a freely available DLL with a rich API supporting many of Excel’s advanced features, such as pivot tables, sorting, and filtering. ExcelXmlWriter.Generator is a utility that will quickly generate the logic necessary to reverse-engineer an existing Excel file, with all its data and formatting intact. The Generator drastically cuts the time needed to build code for using complex workbooks.
Mares’s tools are completely standalone; you don’t need to have Excel or Office on the system hosting your application.
Generate Excel files from any data source. Online code-generation tool greatly simplifies creation of classes.
License type
Freeware, no specific license
Online resources
Email
Supported Frameworks
.NET 2.0
Getting Started
Download the tool from its home page. The distribution is a .zip file containing an assembly you’ll need to reference in your application. The ExcelXmlWriter.Generator application is a standalone app that you can drop in any convenient directory. Mares’s tools also include an API documentation help file, which you can download from his web site.
XML-based workbooks have been supported in Microsoft Excel since Office XP. ExcelXmlWriter capitalizes on this capability by exposing an API that handles writing out the XML structure based on Microsoft’s Office XML schema.
Office versions before XP won’t be able to use workbooks created by ExcelXmlWriter because they don’t support the XML file format.
The easiest way to get rolling with ExcelXmlWriter is to use the ExcelXmlWriter.Generator application to reverse-engineer an existing workbook. Start by making an Excel sheet with mocked-up data. Get all the formulas, sorting, and formatting set up as you’d like. Figure 4-23 shows an example spreadsheet based on data pulled from the omnipresent Northwind sample database via a three-table query.
Figure 4-23. Mocked-up Excel sheet to use as a starting point
Columns A through E are values straight from various tables in the Northwind sample database. Column F is a formula that computes net sales and takes into account any discounts that may have been applied. Columns H and I are summary fields, with column I using a SUMIF function to gather up each sales rep’s total sales.
Just mock up a few rows of data; you’ll only need enough to get a feel for how the Generator builds code using ExcelXmlWriter’s API. Too many rows will slow down code generation and needlessly bloat the generated class.
ExcelXmlWriter will also support additional details in Excel files, such as fields listed under File -> Properties and default printers if they are set.
Save your mocked-up file as an XML workbook using File -> Save As, making sure to select “XML Spreadsheet (*.xml)” as the type. Next, run the ExcelXmlWriter.Generator tool and use it to load the XML file you just created. Figure 4-24 shows the Generator after it’s finished loading the XML spreadsheet. The top pane shows the source XML and the bottom shows the generated code.
Figure 4-24.The Generator after loading an XML workbook
You can either save the content as a separate class file or copy it over to an existing file in your code editor. You can then use this file in your Windows Forms or Console application solution, or in a code-behind for an ASP.NET project.
The class created by the Generator is broken into three methods:
Generate()
A public method managing creation of the workbook
GenerateWorksheet1()
A private method building the content of the sheet
GenerateStyles()
A private method creating style elements for the sheet Output from the Generator can be quite lengthy. The simple worksheet shown in Figure 4-23 resulted in 611 lines of code—and that’s without wiring in any data source!
Despite the code size, the generated code shows the flexibility of ExcelXmlWriter. For example, it’s a simple matter to set information one usually sees in Excel’s File -> Properties menu. First, create a workbook:
Workbook book = new Workbook();
Then use thePropertiescollection to set information such as the author’s name and company:
book.Properties.Author = "Jim Holmes"; book.Properties.Company = "Iterative Rose Solutions";
The basic flow of building an XML file continues in a sensible pattern. First set up the styling you’ll need for the finished product. The Generator’s reverse-engineering creates the style objects for you, but they’re rather obscurely named—a byproduct of automated code generation. To increase readability and ease maintenance hassles down the road, think about renaming styles such ass22to something clearer in your code. For example, this snippet:
With styling set up, move on to creating the framework of a worksheet to put your data in. The following snippet creates aWorksheetand builds the sheet’s header row by addingWorksheetCell objects to aWorksheetRow. Each cell’s data type is specified as a string because the header row contains labels, not actual data:
WorksheetColumn date = sheet.Table.Columns.Add(); date.Width = 66; date.StyleID = "dateStyle";
WorksheetColumnobjects also let you span formatting to adjacent columns, specify whether a column is hidden, or even hardwire the column’s position via theIndexproperty. Continue adapting the Generator’s output to meet your own needs. You’ll also want to get rid of the data the Generator hardwired in. DeleteWorksheetRowandcell statements that set up an entire row’s worth of data, such as the following:
Once you’ve created and styled all of your columns, you can move on to wiring in a dynamic data source. Get some real data by loading up aDataTablefrom several Northwind tables:
public System.Data.DataTable GetAllEmployeesSalesHistory() { string sel = @"select employees.lastname, orders.orderdate, " + "[order details].unitprice, [order details].quantity, " + "[order details].discount from employees, orders, [order details]" + "where [order details].orderid = orders.orderid AND " + "employees.employeeid = orders.employeeid "+ "order by employees.lastname"; DataTable history = new DataTable(); string connString = "Persist Security Info=False;Integrated Security=SSPI;database= Northwind;server=(local);" SqlConnection conn = new SqlConnection(connString); try { SqlCommand cmd = new SqlCommand(); SqlDataAdapter adapter = new SqlDataAdapter(sel, conn); adapter.Fill(history); } catch (SqlException e) { System.Console.WriteLine("Error: " + e.Message); }
return history; }
Now you can add a method to load data from thatDataTableinto your new workbook:
/// <summary> /// Loads data from the history DataSet into the workbook. /// </summary> /// <param name="book">The Excel workbook.</param> /// <param name="data">The history data.</param> private void LoadData(Workbook book, DataTable data) { // reps is a private member ArrayList holding sales reps' last names reps = new ArrayList(); WorksheetRow sheetRow; WorksheetCell cell;
foreach (DataRow tableRow in data.Rows) { // create a new row sheetRow = book.Worksheets["History"].Table.Rows.Add();
// last name cell = sheetRow.Cells.Add(); cell.Data.Type = DataType.String; cell.Data.Text = tableRow[0].ToString();
// add rep names if they're not in already if (! reps.Contains(tableRow[0])) { reps.Add(tableRow[0]); }
// date of sale cell = sheetRow.Cells.Add(); // Note this is type *string*. Setting a *cell* to DateTime causes load // errors in Excel. The *column's* type is set as DateTime in the // spreadsheet itself, so everything works just fine. cell.Data.Type = DataType.String; DateTime date = Convert.ToDateTime(tableRow[1].ToString()); cell.Data.Text = date.ToShortDateString();
// Net sales -- calculated // Unit cost less any discount times quantity sold cell = sheetRow.Cells.Add(); //Note R1C1 cell reference type! cell.Formula = "=IF(RC[-1] > 0,RC[-3] * ((1 - RC[-1])) * RC[-2], RC[-3]*RC[-2])"; } }
Finally, you can deal with the Summary History section of the report, which sums net sales for each sales rep:
/// <summary> /// Creates the history report section. /// </summary> /// <param name="sheets">The sheets in the Workbook.</param> private void CreateHistoryReportSection(WorksheetCollection sheets) { // reps is a private member ArrayList holding sales reps' last names reps.Sort();
row = history.Table.Rows[0]; // blank cell to get alignment right row.Cells.Add(); row.Cells.Add("Sales Rep", DataType.String, "headerStyle"); row.Cells.Add("Total Sales", DataType.String, "headerStyle");
for (int i = 0; i < reps.Count; i++) { row = history.Table.Rows[i + 1]; // offset one row down repName = reps[i].ToString(); // add and style the rep's name row.Cells.Add(); cell = row.Cells.Add(repName); cell.StyleID = "reportStyle"; // add a cell and include the summing formula cell = row.Cells.Add(); cell.Formula = "=SUMIF(C[-8],"" + repName + "",C[-3])"; cell.StyleID = "reportStyle"; } }
You’re now ready to make use of this class in an application. Create the class and call theGenerate()method, passing it a filename for the workbook. Running the application will leave you with an XML file ready to be loaded into Excel.
Getting Support
Questions, comments, and feature requests are handled by the author at the tool’s site.
ExcelXmlWriter in a Nutshell
ExcelXmlWriter can be a bit difficult to understand at first, and currently there aren’t any online forums or sites actively discussing the tool. Despite this, ExcelXmlWriter can be picked up with minimal effort, especially if you use the Generator tool to reverse-engineer a spreadsheet that’s at least close to what you want.
ExcelXmlWriter is a great tool to get data offline to your application’s users in a format they’re familiar with.
Please check back tomorrow for the conclusion to this article.