ZLib
ZLib
com/bclteam/archive/2007/05/16/system-io-compression-
capabilities-kim-hamilton.aspx
The .NET compression libraries support at the core only one type of compression
format, which is Deflate. The Deflate format is specified by the RFC 1951
specification and a straightforward implementation of that is in our DeflateStream
class.
Other compression formats, such as zlib, gzip, and zip, use deflate as a possible
compression method, but may also use other compression methods. In the case that
they use deflate, you can think of these formats as a wrapper around deflate: they
take bytes generated by deflate compression and tack on header info and checksums.
Our GZipStream class does exactly that – it uses DeflateStream and then adds header
info and checksums specific to the gzip format. The gzip format is specified in RFC
1952.
Until we provide support for the other formats, which we plan to do soon, there are
partial workarounds that may help you out in some situations, but they're
definitely not a complete solution.
0 1
+---+---+
|CMF|FLG| (more-->)
+---+---+
0 1 2 3
+---+---+---+---+
| DICTID | (more-->)
+---+---+---+---+
+=====================+---+---+---+---+
|...compressed data...| ADLER32 |
+=====================+---+---+---+---+
This means that to read a zlib file using only the .NET libraries, you can often
just chop off the first two bytes and 4 end bytes and use DeflateStream on the rest
of the stream as normal. (It would be better to check the dictionary bit and not
attempt to read anything in that case).
Going in the opposite direction isn't as trivial, so I'm not really suggesting to
generate zlib files this way. However, a couple people have asked in the past so
I'll sketch an overview of that.
To start, you need to know which bytes to add at the beginning. With our deflate
implementation, those bytes are 0x58 and 0x85. If you're curious about how this is
derived from RFC 1950, see section 2.2 "Data format" and note that we use a window
size of 8K and the value of FLEVEL should be 2 (default algorithm).
After that, you need to add the Adler-32 checksum at the end. The checksum will
depend on the payload that you're compressing so you need to calculate it
programmatically. Because of this, the easiest way to generate the checksum is to
subclass DeflateStream and override the Write/BeginWrite methods to update the
checksum. Steven Toub's NamedGZipStream article (mentioned at the end) shows an
example of creating such a subclass for generating named gzip files.
http://msdn.microsoft.com/msdnmag/issues/03/06/ZipCompression/default.aspx
But if you don't want to rely on the J# class libraries, we'll need to provide a
better solution.
Now that you're familiar with some compression specifications, let's focus on zip a
little more. A zip specification is here:
http://www.pkware.com/documents/casestudies/APPNOTE.TXT
Notice that zip also allows deflate. Again the same principle applies – there are
deflate bytes packaged in a header and footer. This may tempt you into writing a
zip reader/writer based on DeflateStream (as described above for zlib), but there
are two key differences that make zip more complicated.
First, the zip header contains a lot more information than the zlib header. To read
a zip file, you'd definitely have to parse the header to figure out how many bytes
to skip over because the header contains variable length items such as a file name.
Second, zip tools actively use different compression methods. For example, use
Windows compression tool on a very small text file (with just a few words in it)
and then a bigger file, say around 20 KB. Chances are it used no compression (yes,
that's an option) for the small file and deflate for the 20 KB file.
Because different compression methods are used, an extension of the zlib technique
described above may not help you much if you want to use the .NET libraries to read
zip files. You'd definitely have to read the compression method to determine how to
proceed. If it's deflate, then chop off the header and proceed as above. If it's no
compression, chop off the header and read the bytes as a normal stream of bytes. If
it's something else, then the .NET libraries have no built-in support for it.