browsers, dev

Cookies and Concurrency, Redux

In yesterday’s episode, I shared the root cause of a bug that can cause document.cookie to incorrectly return an empty string if the cookie is over 1kb and the cookie grows in the middle of a DOM document.cookie getter operation.

Unfortunately, that simple bug wasn’t the root cause of the compatibility problem that I was investigating when my code-review uncovered it. The observed compatibility bug was slightly different– in the repro case, only one of the document’s cookies goes missing, and it goes missing even when only one page is setting the cookie.

After the brain-melting exercise of annotating the site’s minified framework libraries (console.log(‘…’) ftw!) via Fiddler’s AutoResponder, I found that the site uses the document.cookie API to save the same cookie (named “ld“) three times in a row, adding some information to the cookie each time. However, the ld cookie mysteriously disappears between 0.4 and 6 milliseconds after it gets set the third time. I painstakingly verified that the cookie wasn’t getting manipulated from any other context when it disappeared.

Hmm…

As I wrote up the investigation notes, I idly noted that due to a trivial typo in the website’s source code, the ld cookie was set first as a Persistent cookie, then (accidentally) as a Session cookie, then as a Persistent cookie.

In re-reading the notes an hour later, again my memory got tickled. Hadn’t I seen something like this before?

Indeed, I had. Just about five years ago, a user reported a similar bug where a HTTP response contained two Set-Cookie calls for the same cookie name and Internet Explorer didn’t store either cookie. I built a reduced test case and reported it to the engineering team.

Pushing Cookies

The root cause of the cookie disappearance relates to the Internet Explorer and Edge “loosely-coupled architecture.”

In IE and Edge, each browser tab process runs its own networking stack, in-process1. For persistent cookies, this poses no problem, because every browser process hits the same WinINET cookie storage area and gets back the latest value of the persistent cookie. In contrast, for session cookies, there’s a challenge. Session cookies are stored in local (per-process) variables in the networking code, but a browser session may include multiple tab processes. A Session cookie set in a tab process needs to be available in all other tab processes in that browser session.

As a consequence, when a tab writes a Session cookie, Edge must send an interprocess communication (IPC) message to every other process in the browser session, telling each to update its internal variables with the new value of the Session cookie. This Cookie Pushing IPC is asynchronous, and if the named cookie were later modified in a process before the IPC announcing the earlier update to the cookie is received, that later update is obliterated.

The Duplicate Set-Cookie header version of this bug got fixed in the Fall 2017 Update (RS3) to Windows 10 and thus my old Set-Cookie test case case no longer reproduces the problem.

Unfortunately, it turns out that the RS3 fix only corrected the behavior of the network stack when it encounters this pattern– if the cookie-setting calls are made via document.cookie, the problem reappears, as in this document.cookie test case.

BadBehavior

Playing with the repro page, you’ll notice that manually pushing “Set HOT as a Session cookie” or “Set as a Persistent cookie” works fine, because your puny human reflexes aren’t faster than the cookie-pushing IPC. But when you push the “Set twice” button that sets the cookie twice in fast succession, the HOT cookie disappears in Edge (and in IE11, if you have more than one tab open).

Until this bug is fixed, avoid using document.cookie to change a persistent cookie to a session cookie.

-Eric

In contrast, in Chrome, all networking occurs in the browser process (or a networking-only process), and if a tab process wants to get the current document.cookie, it must perform an IPC to ask the browser process for the cookie value. We call this “cookie pulling.”

Standard
browsers, dev, Uncategorized

ERROR_INSUFFICIENT_BUFFER and Concurrency

Many classic Windows APIs accept a pointer to a byte buffer and a pointer to an integer indicating the size of the buffer. If the buffer is large enough to hold the data returned from the API, the buffer is filled and the API returns S_OK. If the buffer supplied is not large enough to hold all of the data, the API instead returns ERROR_INSUFFICIENT_BUFFER, updating the supplied integer with the length of the buffer required. The client is expected to reallocate a new buffer of the specified size and call the API again with the new buffer and length.

For example, the InternetGetCookieEx function, used to query the WinINET networking stack for cookies for a given URL, is one such API. The GetExtendedTcpTable function, used to map sockets to processes, is another.

The advantage of APIs with this form is that you can call the API with a reasonably-sized stack buffer and avoid the cost of a heap allocation unless the stack buffer happens to be too small.

In the case of Internet Explorer and Edge, the document.cookie DOM API getter’s implementation first calls the InternetGetCookieEx API with a 1024 WCHAR buffer. If the buffer is big enough, the cookie string is then immediately returned to the page.

However, if ERROR_INSUFFICIENT_BUFFER is returned instead (and if the size needed is 10240 characters (MAX_COOKIE_LEN) or fewer), the API will allocate a new buffer on the heap and call the API again. If the API succeeds, the cookie string is returned to the page, otherwise if any error is returned, an empty string is returned to the page.

Wait. Do you see the problem here?

It’s tempting to conclude that the document.cookie API doesn’t need to be thread-safe–JavaScript that touches the DOM runs in one thread, the UI thread. But cookies are a form of data storage that is available across multiple threads and processes. For instance, subdownload network requests for the page’s resources can be manipulating the cookie store in parallel, and if I happen to have multiple tabs or windows open to the same site, they’ll be interacting with the same cookie jar.

So, consider following scenario: The document.cookie implementation calls InternetGetCookieEx but gets back ERROR_INSUFFICIENT_BUFFER with a required size of 1200 bytes. The implementation dutifully allocates a 1200 byte buffer, but before it gets the chance to call InternetGetCookieEx again, an image on the page sets a new 4 byte cookie which WinINET puts in the cookie jar. Now, when InternetGetCookieEx is called again, it again returns ERROR_INSUFFICIENT_BUFFER because the required buffer is now 1204 characters. Because document.cookie isn’t using any sort of loop-until-success, it returns an empty cookie string.

Now, this is all fast native code (C/C++), so surely this sort of thing is just theoretical… it can’t really happen on a fast computer, right?

Around ten years ago, I showed how you can use Meddler to easily generate a lot of web traffic for testing browsers. Meddler is a simple web server that has a simple GUI code editor slapped on the front (most developers would use node.js or Go for such tasks). I quickly threw together a tiny little MeddlerScript which exercises cookies by loading cookie-setting images in a loop and monitoring the document.cookie API to see if it ever returns an empty string.

Boy, does it ever. On my i7 machines, it usually only takes a few seconds to run into the buggy case where document.cookie returns an empty string.

Failure

I haven’t gone back to check the history, but I suspect this IE/Edge bug is at least fifteen years old.

After confirming this bug, it felt strangely familiar, as if I’d hit this landmine before. Then, as I was writing this post, I realized when… Back in 2011, I shared the C# code Fiddler uses for mapping a socket to a process. That code relies on the GetExtendedTcpTable API, which has the same reallocate-then-reinvoke design. Fortunately, I’d fixed the bug a few weeks later in Fiddler, but it looks like I never updated my blog post (sorry about that).

-Eric

PS: Unrelated, but one more pitfall to be aware of: InternetGetCookieExW has a truly bizarre shape, in that the lpdwSize argument is a pointer to a count of wide characters, but if ERROR_INSUFFICIENT_BUFFER is returned, the size argument is set to the count of bytes required.

Standard
browsers, dev, security

Building your .APP website with NameCheap and GitHub Pages–A Visual Guide

I recently bought a few new domain names under the brand new .app top-level-domain (TLD). The .app TLD is awesome because it’s on the HSTSPreload list, meaning that browsers will automatically use only HTTPS for every request on every domain under .app, keeping connections secure and improving performance.

I’m not doing anything terribly exciting with these domains for now, but I’d like to at least put up a simple welcome page on each one. Now, in the old days of HTTP, this was trivial, but because .app requires HTTPS, that means I must get a certificate for each of my sites for them to load at all.

Fortunately, GitHub recently started supporting HTTPS on GitHub Pages with custom domains, meaning that I can easily get a HTTPS site up in running in just a few minutes.

1. Log into GitHub, go to your Repositories page and click New:

2. Name your new repository something reasonable:

3. Click to create a simple README file:

4. Edit the file

5. Click Commit new file

6. Click Settings on the repository

7. Scroll to the GitHub Pages section and choose master branch and click Save:

8. Enter your domain name in the Custom domain box and click Save

9. Login to NameCheap (or whatever DNS registrar you used) and click Manage for the target domain name:

10. Click the Advanced DNS tab:

11. Click Add New Record:

 

12. Enter four new A Records for host of @ with the list of IP addresses GitHub pages use:

13. Click Save All Changes.

14. Click Add New Record and add a new CNAME Record. Enter the host www and a target value of username.github.io. Click Save All Changes: 

15. Click the trash can icons to delete the two default DNS entries that NameCheap had for your domain previously:

16. Try loading your new site.

  • If you get a connection error, wait a few minutes for DNS to propagate and re-verify the DNS records you just added.
  • If you get a certificate error, look at the certificate. It’s probably the default GitHub certificate. If so, look in the GitHub Pages settings page and you may see a note that your certificate is awaiting issuance by LetsEncrypt.org. If so, just wait a little while.

  • After the certificate is issued, your site without errors:

 

Go forth and build great (secure) things!

-Eric Lawrence

Standard
dev, security

Strict-Transport-Security for *.dev, *.app and more

Some web developers host their pre-production development sites by configuring their DNS such that hostnames ending in .dev point to local servers. Such configurations were not meaningfully impacted when .dev became an official Generic Top Level Domain a few years back, because even as smart people warned that developers should stop squatting on it, Google (the owner of the .dev TLD) was hosting few (if any) sites on the gTLD.

With Chrome 63, shipping to the stable channel in the coming days, things have changed. Chrome has added .dev to the HSTS Preload list (along with the .foo, .page, .app, and .chrome TLDs). This means that any attempt to visit http://anything.dev will automatically be converted to https://anything.dev.

hstspreload

Other major browsers use the same underlying HSTS Preload list, and we expect that they will also pick up the .dev TLD entry in the coming weeks and months.

Of course, if you were using HTTPS with valid and trusted certificates on your pre-production sites already (good for you!) the Chrome 63 change may not impact you very much right away. But you’ll probably want to move your preproduction sites off of .dev and instead use e.g. .test, a TLD reserved for this purpose.

Secure all the things!

-Eric

PS: Perhaps surprisingly, the dotless (“plain”) hostnames http://dev, http://page, http://app, http://chrome, http://foo are all impacted by new HSTS rules as well.

Standard
dev, fiddler

Fiddler And LINQ

Since moving to Google at the beginning of 2016, I’ve gained some perspective about my work on Fiddler over the prior 12+ years. Mostly, I’m happy about what I accomplished, although I’m a bit awed about how much work I put into it, and how big my “little side project” turned out to be.

It’s been interesting to see where the team at Telerik has taken the tool since then. Some things I’m not so psyched about (running the code through an obfuscator has been a source of bugs and annoyance), but the one feature I think is super-cool is support for writing FiddlerScript in C#. That’s a feature I informally supported via an extension, but foolishly (in hindsight) never invested in baking into the tool itself. That’s despite the fact that JScript.NET is a bit of an abomination which is uncomfortable for both proper JavaScript developers and .NET developers. But I digress… C# FiddlerScript is really neat, and even though it may take a bit of effort to port the many existing example FiddlerScript snippets, I think many .NET developers will find it worthwhile.

I’ve long been hesitant about adopting the more fancy features of the modern .NET framework, LINQ key among them. For a while, I justified this as needing Fiddler to work on the bare .NET 2.0 framework, but that excuse is long gone. And I’ll confess, after using LINQ in FiddlerScript, it feels awkward and cumbersome not to.

To use LINQ in FiddlerScript, you must be using the C# scripting engine and you must add System.core.dll inside Tools > Fiddler Options > Scripting. Then, add using System.Linq; to the top of your C# script file.

After you make these changes, you can do things like:

    var arrSess = FiddlerApplication.UI.GetAllSessions();
    bool b = arrSess.Any(s=>s.HostnameIs("Example.com"));
    FiddlerApplication.UI.SetStatusText((b) ? "Found it!":"Didn't find it.");

-Eric Lawrence

Standard
dev, perf

Finding Image Bloat In Binary Files

I’ve previously talked about using PNGDistill to optimize batches of images, but in today’s quick post, I’d like to show how you can use the tool to check whether images in your software binaries are well optimized.

For instance, consider Chrome. Chrome uses a lot of PNGs, all mashed together a single resources.pak file. Tip: Search for files for the string IEND to find embedded PNG files.

With Fiddler installed, go to a command prompt and enter the following commands:

cd %USERPROFILE%\AppData\Local\Google\Chrome SxS\Application\60.0.3079.0
mkdir temp
copy resources.pak temp
cd temp
"C:\Program Files (x86)\Fiddler2\tools\PngDistill.exe" resources.pak grovel
for /f "delims=|" %f in ('dir /b *.png') do "c:\program files (x86)\fiddler2\tools\pngdistill" "%f" log

You now have a PNGDistill.LOG file showing the results. Open it in a CSV viewer like Excel or Google Sheets. You can see that Chrome is pretty well-optimized, with under 3% bloat.

image

Let’s take a look at Brave, which uses electron_resources.pak:

image

Brave does even better! Firefox has images in a few different files; I found a bunch in a file named omni.ja:

image

The picture gets less rosy elsewhere though. Microsoft’s MFC140u.dll’s images are 7% bloat:

image

Windows’ Shell32.dll uses poor compression:

image

Windows’ ImageRes.dll has over 5 megabytes (nearly 20% of image weight) bloat:

image

And the Windows 10’s ApplicationFrame.dll is well-compressed, but the images have nearly 87% metadata bloat:

image

Does ImageBloat Matter?

Well, yes, it does. Even when software isn’t distributed by webpages, image bloat still takes up precious space on your disk (which might be limited in the case of a SSD) and it burns cycles and memory to process or discard unneeded metadata.

Optimize your images. Make it automatic via your build process and test your binaries to make sure it’s working as expected.

-Eric

PS: Rafael Rivera wrote a graphical tool for finding metadata bloat in binaries; check it out.

PPS: I ran PNGDistill against all of the PNGs embedded in EXE/DLLs in the Windows\System32 folder. 33mb * 270M devices = 8.9 petabytes of wasted storage for imagebloat in system32 alone.  Raw Data:

Standard