Regular readers of my blog know how much I love Zopfli, Google’s compression engine that often shrinks output by 5% or better when compared to the popular zlib engine. The beauty of Zopfli is that its output is compatible with all of the billions of existing DEFLATE encoders deployed worldwide, making its use an easy choice for any static content.
But imagine for a moment what compression ratios we could achieve if we weren’t limited by compatibility with existing decoders? If we could add a new compression engine to the web, what might it look like?
The Brotli compression engine, co-written by Jyrki Alakuijala (inventor of Zopfli), provides one answer. Brotli combines the LZ77 and Huffman algorithms of DEFLATE with a larger sliding window (up to 16mb1 vs. DEFLATE’s 32kb) and context modeling; the specification also calls for a 122kb static dictionary.
Brotli In Browsers
Today, Brotli is the compression engine behind the newish WOFF2 font format, providing savings of approximately 25% over WOFF 1.0 fonts compressed with Zopfli. Not content to rest on their laurels, Google has announced their Intent to Implement Brotli as a general purpose HTTP Content-Encoding, allowing web developers to use it to compress script, stylesheets, svg, xml, and the like. Firefox beat Google to the finish and shipped Brotli support in the Firefox 44 Dev build.
Probably HTTPS only
Past attempts to add new compression algorithms (bzip2 and SDCH) have demonstrated that a non-trivial number of intermediaries (proxies, gateway scanners) fail when Content-Encodings other than GZIP and DEFLATE are specified, so Brotli will probably only be supported over HTTPS connections, where intermediaries are less likely to interfere.
Accept-Encoding: br, gzip, deflate, sdch
Running a few simple tests with Fiddler, I saw great results with Brotli:
A random giant XML documentation file:
Microsoft Word Online WordEditor.js
Microsoft Word Online WordEditor.Wac.TellMeModel.js
Cloudflare’s blog post on Brotli includes some benchmarks too.
Brotli is optimized for decompression speed. When compressing, Brotli is slower than zlib’s deflate, but considerably faster than zopfli, lzma and bzip2; given 1gb of extremely compressible content, Brotli finished compressing it to 3339 bytes after 301 seconds of CPU time. After 8040 seconds of CPU time, zopfli.exe crashed when a memory allocation failed.
To make things simpler for Windows users, I’ve built the latest release (v0.3) from GitHub for Win32 using Visual Studio 2015. You can download the Authenticode-signed Windows Brotli.exe from my site.
To compress a file, specify the input and output filenames:
… and optionally specify any of the following arguments:
The quality parameter controls the compression-speed vs. compression-ratio tradeoff; the higher the quality, the slower but denser the compression. The supported range is 0 to 11, and 11 is the default.
The force parameter instructs Brotli to overwrite the output file if it already exists.
The verbose parameter instructs Brotli to display its compression speed in megabytes per second upon completion.
To decompress a file, use the
--decompress parameter and specify the input and output filenames:
… and optionally specify the
--verbose parameter to instruct Brotli to display its decompression speed in megabytes per second upon completion.
If you’d like to expose Brotli inside Fiddler 188.8.131.52+, place brotli.exe inside Fiddler’s \Tools\ subfolder and restart to see it appear on the Transformer tab:
If you’d like to follow along:
Alas, the Brotli Discussion forum is currently empty.
Assorted Further Investigations
1. Someone needs to register the brotli token in the IANA registry (although Google’s SDCH and Microsoft’s Xpress aren’t listed there either).
2. Implementers should consider protections against “brotli bombing” DoS attacks. Brotli’s high compression ratio makes attacks even cheaper for the bad guys. A trivial test of compressing a file containing all 0s shows that Brotli can achieve a compression ratio of at least 386516:1, meaning that 1389 bytes of compressed data can blow up to 512MB when uncompressed. In contrast, DEFLATE has a maximum compression ratio approaching 1032 to 1, so an attacker would need to send 375 times as much data over the network to achieve a similar result. That being said, even DEFLATE can result in a denial-of-service, as with this 5.8mb PNG file that can require allocation of up to 141GB of memory.
3. Brotli’s use in WOFF2 means that browsers have already taken on its attack surface. However, not all attack surface is created equal; WOFF2 fonts can be decoded inside a very restricted sandbox. When Chrome 1.0 released, I was surprised to learn its HTTP decompressors were in a full-trust process; it turns out that is still the case today, which makes fuzzing against decompressors very interesting to an attacker.
4. Brotli’s static dictionary was generated from a broad corpus of content, but considering the most likely use cases (static files), it may not be optimal for this use. At this point, it’s probably too late to change it.
5. When used as a Content-Encoding, will brotli be used “bare” or in some framing format (e.g. with a trailing CRC and size marker)? Will it have magic bytes that will allow sniffing? (Per @mcmanusducksong, Firefox is going with a bare stream and no magics. boo)
6. While not terribly relevant to my scenarios, it turns out Google builds a lot of compression engines I’d never heard of, e.g. Snappy and Gipfeli. When compression speed is more important than ratio, they’re worth a look.
7. Brotli makes the most sense for pre-compression of static content; to that end, someone needs to xcopy the http_gzip_static module for nginx and make a few tweaks to create a new http_brotli_static module. While the nginx team may eventually release one, Google released a brotli module that supports both dynamic and static compression.
1 While Brotli can use a 16mb window, for performance reasons it appears that constraining the window to 4mb is the plan for most scenarios.