Tuning MemoryStream

By day, I build the Fiddler Web Debugger. I’ve recently started integrating telemetry collection into the application for automated exception reporting and to collect information about the user’s environment to ensure that Fiddler testing environments match real-world usage.

The data is fascinating (and I’ll probably blog more about it later), but one data point in particular jumped out at me:

23.09% of users on 32bit

Early data suggests that nearly a quarter of Fiddler users are using the tool under 32bit Windows. This number is significantly higher than I expected and it’s worrisome. Why?

Because well over half of the exceptions reported via telemetry are System.OutOfMemoryException, and 100% of those are from users running 32bit. Now, given the exception’s name, most users assume this means that the system has “run out of RAM”, but this isn’t really the case. In virtually all instances, this exception would more correctly be named System.InsufficientContiguousAddressSpace – the user has plenty of memory available (both RAM and virtual memory backed by the swap file), but that memory is fragmented such that a large allocation (of, say, 100megabytes) cannot be fit within a single address range of the 2 billion addresses available:

Fragmented Memory

If that 100mb allocation were split into two smaller chunks of 50 megabytes each, it could easily fit into the available address space, and you’d never hit the OutOfMemory exception.

When Fiddler is running in a 64bit process, this error is almost never seen because the address space is so overwhelmingly huge (8 terabytes, 128tb on Win8.1+) that fragmentation isn’t a problem like it is in the 2GB address space.

So, what can we do?

The “simple” approach is to just throw the /3GB switch so that the system assigns up to 3GB of address space to the application, but this requires that the user manually adjust their overall operating system configuration, and it doesn’t buy us very much additional space.

The better approach is to stop using contiguous memory and start using data structures that keep portions of the data in reasonably-sized chunks of only a few megabytes each. This is arguably the “right” fix, but it can be considerably more complex, and it would be a major breaking change for Fiddler because Fiddler exposes HTTP request and response bodies as byte[] requestBodyBytes and byte[] responseBodyBytes. Even if we were to change that, popular APIs like GetResponseBodyAsString would end up making contiguous allocations anyway.

Users can use rules to stream large requests and responses without Fiddler keeping a copy, but this limits a lot of Fiddler’s power—streamed and forgotten traffic can’t be modified and isn’t available for inspection. As a consequence, the “stream and forget” approach is best reserved for bodies over 2 gigabytes (since .NET, even in 64bit, cannot use more than 2gb in a single allocation).

So are we out of luck?

No, not quite.

Before creating the byte arrays, Fiddler has to collect all of the traffic. It does so using a MemoryStream because Fiddler can’t know ahead of time how big the request or response will be.

MemoryStreams are tremendously useful objects, but they’re really just a bit of syntactic sugar around a backing array that resizes itself when you attempt to write more data to it than it can hold. The Write method checks the size of the write operation and if space is needed, it calls a private EnsureCapacity method to grow the backing array. Let’s have a look at this method, trimmed for clarity:

The first thing to notice is the code in the orange box; the MemoryStream has an optimization that says: “If the required capacity is less than double the current capacity, then double the current capacity.” This optimization helps ensure that if you’re writing small amounts of data to the MemoryStream over and over, it doesn’t cause the MemoryStream to reallocate the byte array over and over. Reallocations are especially expensive because the data from the old backing array must be copied to the new backing array upon each allocation.

However, this optimization quickly becomes problematic for 32bit Fiddler. Say we’re reading a large file download of 75 megabytes, reading at 15kb per second. The MemoryStream starts out at 16kb, then quickly grows to 32kb, 64kb, 128kb, 256kb, 512kb, 1mb, 2mb, 4mb, 8mb, 16mb, 32mb, 64mb, and finally ends at 128mb. When the download completes, before Fiddler can generate the 75mb responseBodyBytes array, the MemoryStream has tried to allocate 128 contiguous megabytes, an allocation that is 53 megabytes larger than we truly need. Even if we have a 75 megabyte slot available in memory, we need a 128 megabyte slot for the download to succeed. You’ll also notice that the MemoryStream makes no attempt to catch an OutOfMemoryException error and allocate only what is needed if the double-size allocation fails.

Not good.

Also, have a look at the code in the blue box, which is new for .NET 4.6. Prior to the addition of this code, once more than 1GB of data was written to the MemoryStream, any additional writes to the stream would only grow the MemoryStream to the exact size needed. That was awful for Fiddler performance because every single read from the network would result in reallocating the entire array, from 1.000000gb to 1.000064gb to 1.000128gb to 1.000192gb to 1.000256gb, etc. That performance problem has been fixed, but it introduces a massive overallocation—when a MemoryStream with 1GB of data is reallocated, it automatically jumps to just under 2 gigabytes of data capacity—potentially wasting up to nearly a gigabyte of memory. That’s not awesome, even on 64bit.

So, what to do?

While we can’t fix all of these problems, we can improve the situation over the stock MemoryStream because we know a little bit more about how the caller will be using it than the .NET Framework designers could assume.

We replace MemoryStream with a new class which descends from MemoryStream called PipeReadBuffer. The PipeReadBuffer has an overridden Write method that evaluates the bitness of the process, the size of the current response, and, if available, a “hint” about the final body size. The growth algorithm is enhanced such that:

Capacity avoids growing past 85kb as long as possible, to keep the allocation off the large object heap.
After that, Capacity grows quickly at small sizes, but growth caps at 16mb (on 32bit) to 64mb (on 64bit) at large sizes.
The caller can send the PipeReadBuffer a “Hint” about the expected final capacity based on the Content-Length header, if any.

With these optimizations, the PipeReadBuffer can help ensure that waste is minimized in allocations and the risk of OutOfMemoryExceptions is reduced as much as possible.

Elsewhere in the code, we found great opportunities to tune MemoryStream’s growth. For instance, we have a method called GZipExpand that decompresses data that was compressed using gzip. The API we use unzips into a MemoryStream and previously was subject to the same overallocation problem seen in the network reads. Fortunately, the gzip format includes a hint about the size of the uncompressed data in the footer, so we can use that to set the capacity of our MemoryStream ahead of time:

Trimmed GzipExpand source

You can also see that here we take advantage of another optimization—if the stream is sized appropriately, we can avoid the allocation inside the ToArray() call by using the GetBuffer() method instead.

These improvements will appear in the next release of Fiddler.

Further explorations: The Bing team has released a promising new RecyclableMemoryStream (announcement) which might be an even better fit for the PipeReadBuffer, as it can avoid allocations on the Large Object Heap; those allocations are particularly a problem for older versions of .NET that don’t offer any way to compact the LOH.

-Eric Lawrence

Tuning MemoryStream

Published by ericlaw

Leave a comment Cancel reply

Share this:

Published by ericlaw

Leave a comment Cancel reply