Finding Image Bloat In Binary Files

I’ve previously talked about using PNGDistill to optimize batches of images, but in today’s quick post, I’d like to show how you can use the tool to check whether images in your software binaries are well optimized.

For instance, consider Chrome. Chrome uses a lot of PNGs, all mashed together a single resources.pak file. Tip: Search for files for the string IEND to find embedded PNG files.

With Fiddler installed, go to a command prompt and enter the following commands:

cd %USERPROFILE%\AppData\Local\Google\Chrome SxS\Application\60.0.3079.0
mkdir temp
copy resources.pak temp
cd temp
"C:\Program Files (x86)\Fiddler2\tools\PngDistill.exe" resources.pak grovel
for /f "delims=|" %f in ('dir /b *.png') do "c:\program files (x86)\fiddler2\tools\pngdistill" "%f" log

You now have a PNGDistill.LOG file showing the results. Open it in a CSV viewer like Excel or Google Sheets. You can see that Chrome is pretty well-optimized, with under 3% bloat.


Let’s take a look at Brave, which uses electron_resources.pak:


Brave does even better! Firefox has images in a few different files; I found a bunch in a file named omni.ja:


The picture gets less rosy elsewhere though. Microsoft’s MFC140u.dll’s images are 7% bloat:


Windows’ Shell32.dll uses poor compression:


Windows’ ImageRes.dll has over 5 megabytes (nearly 20% of image weight) bloat:


And the Windows 10’s ApplicationFrame.dll is well-compressed, but the images have nearly 87% metadata bloat:


Does ImageBloat Matter?

Well, yes, it does. Even when software isn’t distributed by webpages, image bloat still takes up precious space on your disk (which might be limited in the case of a SSD) and it burns cycles and memory to process or discard unneeded metadata.

Optimize your images. Make it automatic via your build process and test your binaries to make sure it’s working as expected.


PS: Rafael Rivera wrote a graphical tool for finding metadata bloat in binaries; check it out.

Finding Image Bloat In Binary Files

2016 Brotli Update

Windows 10 Build 14986 adds support for Brotli compression to the Edge browser (but, somewhat surprisingly, not IE11). So at the end of 2016, we now have support for this improved compression algorithm in Chrome, Firefox, Edge, Opera, Brave, Vivaldi, and the long tail of browsers based on Chromium. Of modern browsers, only Apple is a holdout, with a “Radar” feature request logged against Safari but no public announcements.

Unfortunately, behavior across browsers varies at the edges:

  • Edge advertises support for and decodes Brotli compression on both HTTP and HTTPS requests.
  • Chrome advertises Brotli for HTTPS connections but will decode Brotli for both HTTPS and HTTP responses.
  • Firefox advertises Brotli for HTTPS connections and will not decode Brotli responses on HTTP responses.

There’s nothing horribly broken here: sites can safely serve Brotli content to clients that ask for it and those clients will probably decode it. The exception is when the request goes over HTTP… the reason Firefox and Chrome limit their request for Brotli to HTTPS is that, historically, middleboxes (like proxies and gateway filters) have been known to corrupt compression schemes other than gzip and deflate. This proved to be such a big problem in the rollout of SDCH (a now defunct compression algorithm Chrome supported), that the Brotli implementers decided to try to avoid the issue by requiring a secure transport.


PS: Major sites, including Facebook and Google, have started deploying Brotli in production– if your site pulls fonts from Google Fonts, you’re already using Brotli today! In unrelated news, the 2016 Performance Calendar includes a post on serving Brotli from CDNs that don’t explicitly support it yet. Another recent post shows how to pair maximal compression for static files with fast compression for dynamically generated responses.

2016 Brotli Update

Out-of-Memory is (Usually) a Lie

  • The most common exception logged by Fiddler telemetry is OutOfMemoryException.
  • Yesterday, a Facebook friend lamented: “How does firefox have out of memory errors so often while only taking up 1.2 of my 8 gigs of ram?
  • This morning, a Python script running on my machine as a part of the Chromium build process failed with a MemoryError, despite 22gb of idle RAM.

Most platforms return an “Out of Memory error” if an attempt to allocate a block of memory fails, but the root cause of that problem very rarely has anything to do with truly being “out of memory.” That’s because, on almost every modern operating system, the memory manager will happily use your available hard disk space as place to store pages of memory that don’t fit in RAM; your computer can usually allocate memory until the disk fills up (or a swap limit is hit; in Windows, see System Properties > Performance Options > Advanced > Virtual memory).

So, what’s happening?

In most cases, the system isn’t out of RAM—instead, the memory manager simply cannot find a contiguous block of address space large enough to satisfy the program’s allocation request.

In each of the failure cases above, the process was 32bit. It doesn’t matter how much RAM you have, running in a 32bit process nearly always means that there are fewer than 3 billion addresses1 at which the allocation can begin. If you request an allocation of n bytes, the system must have n unused addresses in a row available to satisfy that request.

Making matters much worse, every active allocation in the program’s address space can cause “fragmentation” that can prevent future allocations by splitting available memory into chunks that are individually too small to satisfy a new allocation with one contiguous block.


Running out of address space most often occurs when dealing with large data objects like arrays; in Fiddler, a huge server response like a movie or .iso download can be problematic. In my Python script failure this morning, a 1.3gb file (chrome_child.dll.pdb) needed to be loaded so its hash could be computed. In some cases, restarting a process may resolve the problem by either freeing up address space, or by temporarily reducing fragmentation enough that a large allocation can succeed.

Running 64-bit versions of programs will usually eliminate problems with address space exhaustion, although you can still hit “out-of-memory” errors before your hard disk is full. For instance, to limit their capabilities and prevent “runaway” allocations, Chrome’s untrusted rendering processes run within a Windows job object with a 4gb memory allocation limit:

Job limit 4gb shown in SysInternals Process Explorer

Elsewhere, the .NET runtime restricts individual array dimensions to 2^31 entries, even in 64bit processes2.

-Eric Lawrence

1 If a 32bit application has the LARGEADDRESSAWARE flag set, it has access to s full 4gb of address space when run on a 64bit version of Windows.

2 So far, four readers have written to explain that the gcAllowVeryLargeObjects flag removes this .NET limitation. It does not. This flag allows objects which occupy more than 2gb of memory, but it does not permit a single-dimensional array to contain more than 2^31 entries.

Out-of-Memory is (Usually) a Lie

Automatically Evaluating Compressibility

Fiddler’s Transformer tab has long been a simple way to examine the use of HTTP compression of web assets, especially as new compression engines (like Zopfli) and compression formats (like Brotli) arose. However, the one-Session-at-a-time design of the Transformer tab means it is cumbersome to use to evaluate the compressibility of an entire page or series of pages.

Introducing Compressibility

Compressibility is a new Fiddler 4 add-on1 which allows you to easily find opportunities for compression savings across your entire site. Each resource dropped on the compressibility tab is recompressed using several compression algorithms and formats, and the resulting file sizes are recorded:

Compressibility tab

You can select multiple resources to see the aggregate savings:

Total savings text

WebP savings are only computed for PNG and JPEG images; Zopfli savings for PNG files are computed by using the PNGDistill tool rather than just using Zopfli directly. Zopfli is usable by all browsers (as it is only a high-efficiency encoder for Deflate) while WebP is supported only by Chrome and Opera. Brotli is available in Chrome and Firefox, but limited to use from HTTPS origins.

Download the Addon…

To show the Compressibility tab, simply install the add-on, restart Fiddler, and choose Compressibility from the View > Tabs menu2.

View > Tabs > Compressibility menu screenshot

The extension also adds ToWebP Lossless and ToWebP Lossy commands to the ImageView Inspector’s context menu:


I hope you find this new addon useful; please send me your feedback so I can enhance it in future updates!


1 Note: Compressibility requires Fiddler 4, because there’s really no good reason to use Fiddler 2 any longer, and Fiddler 4 resolves a number of problems and offers extension developers the ability to utilize newer framework classes.

2 If you love Compressibility so much that you want it to be shown in the list of tabs by default, type prefs set extensions.Compressibility.AlwaysOn true in Fiddler’s QuickExec box and hit enter.

Automatically Evaluating Compressibility

Getting Started with Profile Guided Optimization

For the convenience of the Windows developer community, I periodically compile the Zopfli and Brotli compressors from source, building for Win32 and code-signing the binaries (Interested? Get Zopfli.exe and Brotli.exe). After announcing the latest build on Twitter, I got an interesting question in reply:

Do you even PGO?

While I try to use the latest compiler (VS2015 U1), I’ve never used PGO with C++ myself. Profile guided optimization requires that you first compile a special instrumented binary that you run against a training set of data. The generated profiling data is fed into the compiler and it compiles an optimized binary based on the observed execution of the code, tuning the hottest paths for speed.

As with any technology-adoption question, I wondered: 1> Is using PGO hard? and 2> Will it noticeably improve performance?

Spoiler alert: The answers are “No” and “Yes.”

I started by skimming this old blog about PGO in Visual Studio; it looks pretty simple.

Optimizing a compressor with PGO is pretty straightforward. Unlike a GUI application with thousands of different operations, a compressor really only does one thing—compress.

I created a folder with files that I felt reasonably represent the types of data that I’ll be compressing with Zopfli (eight files captured via Fiddler). I could’ve experimented using a broader sample, but this seemed like a fine corpus of data with which to begin.

Click Build > Profile Guided Optimization > Instrument to generate an instrumented binary:

Build > Profile Guided Optimization > Instrument

Right-click the project in the Solution Explorer pane and choose Debugging under the Configuration Properties category. Edit the Command Arguments to specify the training scenario. Zopfli accepts a list of files to compress, so we simply list all eight:

Edit Command arguments

Close the dialog and click Build > Profile Guided Optimization > Run Instrumented/Optimized Application to run our application and generate profiling data:

Run Instrumented/Optimized Application

The scenario then runs; it takes a bit of extra time due to the cost of the profiling instructions in the instrumented binary. After it completes, a new file (Zopfli!1.pgc) is written to the \Release\ folder; if we’d run the application multiple times to train different scenarios, Zopfli!2.pgc, Zopfli!3.pgc, etc would be present as well.

Finally, click Build > Profile Guided Optimization > Optimize to generate a new build using the profiling data to select paths for optimization. You can see the effect of the profiling database on the Build in the Output window:

Build output shows optimizations

Now your executable has been optimized.

Pretty simple, right?

Proper benchmarking is an entire field itself, but let’s do the simplest thing that could possibly work to check the effectiveness of the optimizations:

Script runs optimized and unoptimized

We run the script a few times and see that the original unoptimized binary takes ~64 seconds to compress the corpus and the optimized binary takes ~46 seconds, a savings of almost 30%.

ZopFli PGO vs non PGO

You should run the same benchmark against a new set of data, just to ensure that your changes yield similar improvements (or at least no regression!) given different input data. A few runs of my PNGDistill tool (which uses Zopfli internally) show improvements of 10% to 25% when using the optimized compressor.

Pretty cool, right?

-Eric Lawrence

Getting Started with Profile Guided Optimization

What’s New in Fiddler

TLDR? – Get the newest Fiddler here. We’re performing a staged rollout of this build; it won’t be on autoupdate until next week.

Under the Hood

As mentioned in our notes about the Fiddler 4.6 release, we’ve started taking a very close look at Fiddler’s performance. Fiddler’s use of the CPU, system memory, and the network have gone under the microscope and this new release includes several major changes to how Fiddler uses threads and memory. If you frequently run Fiddler with a large amount of traffic in parallel, or run Fiddler on a slower or heavily-loaded PC, this new version should provide significantly improved performance. We’ve also improved overall performance by using better algorithms in scenarios like the Find Sessions (CTRL+F) experience.

The !threads and !memory QuickExec commands have been enhanced to provide insights into fine-grained performance details about Fiddler’s operation.


The Performance Tab

The new Performance tab in the Fiddler Options dialog offers choices that can significantly change Fiddler’s runtime performance and memory usage.


The Show Memory panel in status bar controls whether Fiddler’s status bar shows a panel that indicates the current memory usage tracked by the garbage collector. For example, when Fiddler has 64mb of managed memory allocated, the panel looks like this:


Left-click the memory panel to instruct the .NET Framework to perform an immediate garbage collection. Right-click the panel to launch the Fiddler Options dialog box with the Performance tab activated.

The Parse WebSocket Messages checkbox controls whether Fiddler will parse WebSocket streams into individual messages, allowing display in the WebSocket tab and manipulation using the OnWebSocketMessage event handler. Disabling WebSocket Message parsing will reduce CPU usage and may save a significant amount of memory if high-traffic WebSockets are in use. Even if you disable WebSocketMessage parsing globally using this checkbox, it can be reenabled on a per-Session basis by setting the x-Parse-WebSocketMessages flag on the Session object.

The Stream and forget bodies over box controls the maximum size of a message body that Fiddler will retain. By default, the limit is just under 2 gigabytes for 64bit Fiddler and 16 megabytes for 32bit Fiddler; the much smaller default for 32bit helps avoid problems with “Out of Memory” errors when running Fiddler in a small address space. If, while reading a message body, Fiddler finds that it is larger than the threshold, it will configure the body to stream and will “drop” the bytes of the body to conserve memory. If you attempt to inspect a Session which has been dropped, you will see the following notification bar:


Clicking the bar will open the Fiddler Options dialog to allow you to reconfigure the limit for subsequent Sessions.

The If client aborts while streaming dropdown controls Fiddler’s behavior if a response body is streaming to the client but the client closes the connection. Depending on your choice here, Fiddler can continue to read the body from the server (useful if you’re collecting traffic) or abort the Session (useful to save memory and CPU cycles).

The Run Fiddler at AboveNormal Priority alters Fiddler’s default scheduling priority. If you enable this option, Windows will prioritize activation of Fiddler’s threads when they have work to do (e.g. reading a new request or response from the network). You can easily experiment with this option to see whether it improves the overall throughput of your client (browser) and Fiddler.


New QuickFilters

The Session list’s Filter Now context menu has been enhanced with two new filters:


The Hide /1stpath/ filter hides any traffic whose Url path component starts with the specified string.

The Hide Url… option uses the current Session’s Url as the default of a Url filter; you can edit the string to apply more broadly by removing text from the start or end of the string:



High DPI Improvements

Today, only a small number of Fiddler users (< 4%) run Fiddler on Windows systems with a non-default screen DPI, but we want Fiddler to work great for those users too. The latest build of Fiddler includes a number of DPI-related fixes. Fiddler is not yet marked DPI aware in its manifest; if you’d like to see Fiddler in its DPI-aware mode, run Fiddler with the -dpiAware command line argument:

    fiddler.exe -dpiAware

We will continue to make improvements as problems are discovered or reported and expect to eventually set the dpiAware flag by default.


New HTTPS Cipher Option

Fiddler 4 on Windows 7 and later supports modern TLS versions (TLS 1.1 and TLS 1.2) and the HTTPS tab on the Fiddler Options dialog enables you to easily enable these protocols. However, TLS 1.1 and 1.2 remain off-by-default for compatibility reasons.

The Enabled Protocols link on the HTTPS dialog now supports a new token <client>; if present, this token adds to the list of versions offered to the server the latest protocol that has been offered by the client. For instance, with these settings:


… a request from Internet Explorer offering TLS 1.2 will be presented to the server with TLS 1.2 and TLS 1.0 enabled. The advantage of using the <client> token is that if the request fails, many browser clients are configured to “fall back” and attempt negotiation with an earlier protocol version. In this example, if the TLS 1.2 connection fails, the browser will retry with TLS 1.0 and the connection may succeed.

You must include at least one specific TLS version in the HTTPS Protocols list to handle cases where the request was generated by Fiddler itself (e.g. the Composer tab).


Brotli Compression Support

Researchers at Google have developed a new compression algorithm named Brotli that offers significantly better compression than the DEFLATE algorithm used by Gzip. This new compression algorithm is already in use by many browsers today (inside the WOFF2 font format) and it is expected to appear as a HTTP Content-Encoding in early 2016.

Fiddler now supports Brotli as a Content-Encoding everywhere compression is supported; simply download the Authenticode-signed Windows Brotli.exe and place it in the Fiddler2\Tools\ subfolder in your Program Files folder. After you restart Fiddler, you will find a new Brotli option on the Transformer tab and Fiddler APIs like utilDecodeResponse() will be able to decompress Brotli-encoded content:


The new version of Fiddler also has better handling of unsupported compression schemes like SDCH—if a response with Content-Encoding: sdch,gzip is encountered, for instance, the various decoding APIs will decompress with GZIP and then stop without removing the SDCH token.


Hidden Tabs

FiddlerScript’s BindUITab attribute now supports a <hidden> token:


If this token is present, the tab is not shown until the user manually activates it via the View > Tabs menu:


This feature helps you “unclutter” Fiddler by keeping uncommonly-used script tabs hidden.


More Powerful ImageView Extensions

Fiddler’s ImageView Extensions feature allows you to add new commands to the Tools context menu on the ImageView Inspector:


Now you can use a new Options parameter to specify that Fiddler should show the <stdout> or <stderr> results of running the target tool, and a new {out:extension} token enables you to specify that the target tool writes a file that Fiddler should load as a new Session in the Web Sessions list.

For instance, here’s the logic to add a new ToWebP Lossless command to the list:


To use it, add the registry entries and place CWebP.exe in Fiddler’s Tools subfolder. When you invoke the command, Fiddler will run the tool, passing an input temporary file containing the JPEG image to the tool in the {in} parameter and specifying an autogenerated filename with a .webp file extension in the {out:webp} parameter. The cwebp.exe tool will be run, any text from Standard Error will be collected and displayed to the user, and the file named by the {out} token will be reloaded as a new Web Session:



Updated – Show Image Bloat

I’ve also updated the Show Image Bloat add-on (described here) with some additional tweaks and features; the add-on has improved bloat detection for JPEG and GIF files and has other minor improvements. Install the latest build of Show Image Bloat (v2.6) and activate it from the Fiddler Rules menu.



I hope you enjoy these new improvements to Fiddler – Keep sending in your feedback to ensure we’re evolving the tool to best meet your needs.


-Eric Lawrence

What’s New in Fiddler

WebP–What Isn’t Google Telling Us?

Beyond their awesome work on Zopfli and Brotli, Google has brought their expertise in compression to bear on video and image formats. One of the most interesting of these efforts is WebP, an image format designed to replace the aging JPEG (lossy) and PNG (lossless) image formats.

WebP offers more efficient compression mechanisms than both PNG and JPEG, as you can see in this comparison of a few PNG files on Google’s top sites vs. WebP-Lossless versions that are pixel-for-pixel identical:

size table

You can see these savings everywhere, from Google’s homepage logo, which is 3918 bytes (29%) smaller, to Google applications’ image sprites (59% smaller!) to advertisements served by Google’s ad network (18% smaller). These compression savings are much greater than those provided by Zopfli, which is constrained by compatibility with the legacy PNG format.

As an additional benefit, WebP files don’t contain the sort of metadata bloat found in PNG, JPEG, and GIF.

So, the bandwidth and cache-size savings are obvious.

While the format is currently only supported in Chrome and Opera, web servers can easily serve WebP to only clients that request it via the Accept header:

Fiddler screenshot showing WebP in use

This approach to WebP adoption is in use today by major sites like the Washington Post.

Google invented the format, so it’s not a case of “not-invented-here.”

The non-adoption of their own format leads to a troubling question—is there something about WebP that Google isn’t telling us? Surely there must be a good reason that Google’s own properties aren’t reaping the benefits of the format they’ve invented?

Update: Alex Russell retorts “uh, we use webp in TONS of places.”

-Eric Lawrence

PS: WebP Status Tracking links for Firefox and IE/Edge

WebP–What Isn’t Google Telling Us?