SmartZip Archive File Library – Creating and Using Archive Files
Have you ever wanted to work with zip or other archive file formats in the .NET Framework? There is a free library that allows you to do just that. Keep reading to find out more.
Contributed by Michael Swanson Rating: / 10 February 28, 2005
The .NET Framework has no built-in way to work with Zip or other archive file formats. However, this is often a useful option to have. You may want to send compressed files over the Internet, or create a feature on a program to automatically archive certain groups of files (such as logs) for better organization and space management.
Luckily, there is a free library which does this for us. Created by Mike Krueger, it is called the SmartZip library. It is open source and governed by the GPL. This library supports regular Zip format, as well as Gzip, Tar, and Bzip2. The code in this document will deal specifically with the regular Zip format; however, the class structures and methods for working with the other file formats are similar.
Creating a Zip File
In order to create a zip file, there are several steps you must perform. They are as follows:
Create the ZipOutputStream object in the code.
Set the compression ratio on the ZipOutputStream object. Now, for each file you want to add to the ZipOutputStream you must do the following:
Open the file to add in a file stream.
Create a byte buffer to store the file in memory
Create a ZipEntry object to hold metadata for the file
Use the write method to put the file into the output stream.
After all the files you want to add have been added, you must call the finish and close methods to complete the file creation.
Overall, these steps are relatively simple, and don’t contain any major surprises. The only area where care is required surrounds keeping the directory structure intact within a zip file, and much of this care is handled by a method provided by the ZipEntry class, called “CleanName.” I will now describe this process in detail, and give some code snippits to illustrate each step.
First of all, we must create the ZipOutputStream object and set its compression ratio:
These statements create an output stream object that is attached to the file “temp.zip” in the local working directory. This also sets the compression ratio, in this case to the highest compression. The compression level ranges from 0, which performs no compression and basically just combines several files into one, up to 9, which performs the most compression available in the Zip file standard.
After creating the ZipOutputStream object, you must begin adding files to the stream so they can be written. The following code does this for a single file:
FileStream fs = new FileStream(filename); byte[] buffer = new byte[fs.Length]; fs.Read(buffer,0,buffer.Length);
This code opens the file and puts it into a byte buffer. The write method for adding files to the output stream only accepts a byte buffer, so this is necessary. Now, we must create the ZipEnty class to hold the metadata about this particular file entry in the ZipOutputStream and write the byte buffer to the stream.
In this code block, we first create a string to hold the “clean” name of the file being added. This cleaned name is an important step because it strips away the parts of the .NET filename representation that don’t work with the Zip standard. For example, it cleans off the windows volume labels (such as “C:\”) or windows file share names (such as “\\server\dir”). You can add files to the Zip file and preserve their directory structure by using the complete relative filename from the current working directory.
This process above is then repeated for each file you wish to add to the Zip archive. It is also possible to build a function to walk through each directory in directory structure and add each one of those files to a zip archive and maintain that directory structure.
outStream.finish(); outStream.close();
The above lines of code are the last required after having finished everything you want to do with the zip file. The “finish” method is important because it writes out the final information for closing the zip file. If this method doesn’t get called, anything else that tries to access your zip file won’t be able to, as the file won’t be formed correctly. The close method closes the stream connection to the file and releases it for use by other processes.
The ZipEntry class is very useful for maintaining and reading metadata on the files you add or remove from a zip file. The ZipEntry object also acts as the marker through which the ZipOutputStream and ZipInputStream know which file in the compressed archive you are reading. The ZipEntry class has several members which are particularly important or useful. They are as follows:
Name: This is the only required attribute (it must be set in the constructor) and contains the name of the file being compressed.
Comment: Allows you to set a comment on the entry, useful for holding random information about the file.
DateTime: Contains the date and time of the last modification of the zipped file.
isCrypted: If a zip file you open happens to be encrypted, this will be set, and you will need to decrypt the file.
Size: Gets the size of the entry, after it is uncompressed.
CompressedSize: Gets the compressed size of an entry.
Version: This holds the minimum Zip file version implementation required to extract this entry.
VersionMadeBy: This holds the version of Zip implementation this entry was created with.
Above, I presented the process required to write a zip archive to a file. Obviously, you might also want to open a zip file and manipulate its contents programmatically in some way. There are a couple of ways to do this. If you plan on actually extracting the file data, it is generally best to use the ZipInputStream class, as this allows you to step sequentially through a zip archive and extract each entry’s contents. If all you need to do is list the names and maybe some other metadata about each entry, the ZipFile class is best. If you want to actually extract data from a ZipFile class, you must get a stream from the ZipFile class for each entry you want to extract.
First, I will address using the ZipFile class. You can create a Zip file by giving it either a FileStream, a regular Stream, or a string containing a file name. After opening the file into a ZipFile object, you can walk through the ZipEntry objects to get information on each individual file and get streams to uncompress each individual file. The code to do this looks like the following:
Steam outFile = new Stream(); ZipFile zip = new ZipFile(“test.zip”); foreach(ZipEntry file in zip) { outFile = zip.GetInputStream(file); }
You can then take that stream and use it to write out to a file or do something else interesting with it. You can also not even get the input stream at all and simply just get the metadata for each entry.
The other way to access a zip file is to open it into a ZipInputStream. This is simpler to walk through and extract each file. This method extracts each entry into a byte array. You can then take that byte array and pipe it into a file stream to write the file out to the hard drive.
ZipInputStream zip = new ZipInputStream(File.OpenRead(“test.zip”)); ZipEntry entry; while(entry = zip.GetNextEntry())) { byte[] data = new byte[2048]; zip.read(data,0,data.length); FileStream fs = File.Create(entry.Name); fs.write(data,0,data.length); fs.close(); } zip.close();
The above code will open the zip file and read each entry and write the first 2048 bytes out to a file named correctly.
As I said above, the SmartZip library supports writing to other formats. These are GZip, BZip2, and Tar. For GZip, there are only the two GZipInputStream and GZipOutputStream classes for writing out to and reading in from streams. For BZip2, the actual BZip2 class can be used to compress one stream into another or vice versa; it really does nothing else, unlike the ZipFile class mentioned above. Otherwise, both BZip2 and GZip have only the two input and output stream classes.
The Tar implementation works a little bit differently from the other namespaces. When you create a TarArchive class, class is created for reading or writing only; you are not allowed to do both on the same object. In order to do this, you should use the public static factory methods called CreateInputTarArchive and CreateOutputTarArchive. Both of these methods take a stream as a parameter. In general, you can give CreateInputTarArchive any type of stream that contains Tar formatted data. This can be from a file, network, or anywhere else from where you can get a stream.
If you wish to extract the contents of a Tar file, assuming you’ve created a TarArchive object from the CreateInputTarArchive method, you then simply use the ExtractContents method to dump the entire TarArchive to a directory. This directory is represented by a string given to the ExtractContents method as a parameter. If you wish to write TarEntrys to an archive, assuming you’ve created a TarArchive object using the CreateOutputTarArchive method, you can use the WriteEntry method to add new entries to the archive.
In order to use the WriteEntry method, you must create TarEntry objects. These can be instantiated in one of two ways. You instantiate either by passing the constructor an array of bytes that constitutes the actual Tar header, or by passing the constructor a TarHeader object, which contains the same information, but in a more easily accessible manner.
When you instantiate a TarHeader object, it is created with the default settings for all of the many fields available. Many of these fields are read-only or constant, and as such don’t often concern the programmer. Some fields you will want to set are: name, size, and version. It is important to set the size field correctly, as the write method on a TarOutputStream (which TarArchive uses) will throw an exception if you attempt to write data beyond the header’s size parameter. However, to get around all of this confusing class hierarchy, it is possible to use the public static factory function CreateEntryFromFile to automatically create the TarEntry and appropriate header information given just a string representing the filename from which the object is to be created.
Both of the above processes can be accomplished using the TarInputStream and TarOutputStream objects as well, using much the same process as was discussed earlier with Zip files. Of course, after you’ve used the Tar classes to create a single file from many files, you can then run that file through the GzipOutputStream to compress it.
Conclusion
In general, these classes make creating and working with archive files easy and intuitive. With just a few class instantiations and method calls, you can easily create and open Zip archives. This could potentially make sending large amounts of data over a network much simpler and faster, if you are writing a network or Web application. You can find the library at www.icsharpcode.com.