browsers, dev

Cookies and Concurrency, Redux

In yesterday’s episode, I shared the root cause of a bug that can cause document.cookie to incorrectly return an empty string if the cookie is over 1kb and the cookie grows in the middle of a DOM document.cookie getter operation.

Unfortunately, that simple bug wasn’t the root cause of the compatibility problem that I was investigating when my code-review uncovered it. The observed compatibility bug was slightly different– in the repro case, only one of the document’s cookies goes missing, and it goes missing even when only one page is setting the cookie.

After the brain-melting exercise of annotating the site’s minified framework libraries (console.log(‘…’) ftw!) via Fiddler’s AutoResponder, I found that the site uses the document.cookie API to save the same cookie (named “ld“) three times in a row, adding some information to the cookie each time. However, the ld cookie mysteriously disappears between 0.4 and 6 milliseconds after it gets set the third time. I painstakingly verified that the cookie wasn’t getting manipulated from any other context when it disappeared.

Hmm…

As I wrote up the investigation notes, I idly noted that due to a trivial typo in the website’s source code, the ld cookie was set first as a Persistent cookie, then (accidentally) as a Session cookie, then as a Persistent cookie.

In re-reading the notes an hour later, again my memory got tickled. Hadn’t I seen something like this before?

Indeed, I had. Just about five years ago, a user reported a similar bug where a HTTP response contained two Set-Cookie calls for the same cookie name and Internet Explorer didn’t store either cookie. I built a reduced test case and reported it to the engineering team.

Pushing Cookies

The root cause of the cookie disappearance relates to the Internet Explorer and Edge “loosely-coupled architecture.”

In IE and Edge, each browser tab process runs its own networking stack, in-process1. For persistent cookies, this poses no problem, because every browser process hits the same WinINET cookie storage area and gets back the latest value of the persistent cookie. In contrast, for session cookies, there’s a challenge. Session cookies are stored in local (per-process) variables in the networking code, but a browser session may include multiple tab processes. A Session cookie set in a tab process needs to be available in all other tab processes in that browser session.

As a consequence, when a tab writes a Session cookie, Edge must send an interprocess communication (IPC) message to every other process in the browser session, telling each to update its internal variables with the new value of the Session cookie. This Cookie Pushing IPC is asynchronous, and if the named cookie were later modified in a process before the IPC announcing the earlier update to the cookie is received, that later update is obliterated.

The Duplicate Set-Cookie header version of this bug got fixed in the Fall 2017 Update (RS3) to Windows 10 and thus my old Set-Cookie test case case no longer reproduces the problem.

Unfortunately, it turns out that the RS3 fix only corrected the behavior of the network stack when it encounters this pattern– if the cookie-setting calls are made via document.cookie, the problem reappears, as in this document.cookie test case.

BadBehavior

Playing with the repro page, you’ll notice that manually pushing “Set HOT as a Session cookie” or “Set as a Persistent cookie” works fine, because your puny human reflexes aren’t faster than the cookie-pushing IPC. But when you push the “Set twice” button that sets the cookie twice in fast succession, the HOT cookie disappears in Edge (and in IE11, if you have more than one tab open).

Until this bug is fixed, avoid using document.cookie to change a persistent cookie to a session cookie.

-Eric

In contrast, in Chrome, all networking occurs in the browser process (or a networking-only process), and if a tab process wants to get the current document.cookie, it must perform an IPC to ask the browser process for the cookie value. We call this “cookie pulling.”

Standard
browsers, dev, Uncategorized

ERROR_INSUFFICIENT_BUFFER and Concurrency

Many classic Windows APIs accept a pointer to a byte buffer and a pointer to an integer indicating the size of the buffer. If the buffer is large enough to hold the data returned from the API, the buffer is filled and the API returns S_OK. If the buffer supplied is not large enough to hold all of the data, the API instead returns ERROR_INSUFFICIENT_BUFFER, updating the supplied integer with the length of the buffer required. The client is expected to reallocate a new buffer of the specified size and call the API again with the new buffer and length.

For example, the InternetGetCookieEx function, used to query the WinINET networking stack for cookies for a given URL, is one such API. The GetExtendedTcpTable function, used to map sockets to processes, is another.

The advantage of APIs with this form is that you can call the API with a reasonably-sized stack buffer and avoid the cost of a heap allocation unless the stack buffer happens to be too small.

In the case of Internet Explorer and Edge, the document.cookie DOM API getter’s implementation first calls the InternetGetCookieEx API with a 1024 WCHAR buffer. If the buffer is big enough, the cookie string is then immediately returned to the page.

However, if ERROR_INSUFFICIENT_BUFFER is returned instead (and if the size needed is 10240 characters (MAX_COOKIE_LEN) or fewer), the API will allocate a new buffer on the heap and call the API again. If the API succeeds, the cookie string is returned to the page, otherwise if any error is returned, an empty string is returned to the page.

Wait. Do you see the problem here?

It’s tempting to conclude that the document.cookie API doesn’t need to be thread-safe–JavaScript that touches the DOM runs in one thread, the UI thread. But cookies are a form of data storage that is available across multiple threads and processes. For instance, subdownload network requests for the page’s resources can be manipulating the cookie store in parallel, and if I happen to have multiple tabs or windows open to the same site, they’ll be interacting with the same cookie jar.

So, consider following scenario: The document.cookie implementation calls InternetGetCookieEx but gets back ERROR_INSUFFICIENT_BUFFER with a required size of 1200 bytes. The implementation dutifully allocates a 1200 byte buffer, but before it gets the chance to call InternetGetCookieEx again, an image on the page sets a new 4 byte cookie which WinINET puts in the cookie jar. Now, when InternetGetCookieEx is called again, it again returns ERROR_INSUFFICIENT_BUFFER because the required buffer is now 1204 characters. Because document.cookie isn’t using any sort of loop-until-success, it returns an empty cookie string.

Now, this is all fast native code (C/C++), so surely this sort of thing is just theoretical… it can’t really happen on a fast computer, right?

Around ten years ago, I showed how you can use Meddler to easily generate a lot of web traffic for testing browsers. Meddler is a simple web server that has a simple GUI code editor slapped on the front (most developers would use node.js or Go for such tasks). I quickly threw together a tiny little MeddlerScript which exercises cookies by loading cookie-setting images in a loop and monitoring the document.cookie API to see if it ever returns an empty string.

Boy, does it ever. On my i7 machines, it usually only takes a few seconds to run into the buggy case where document.cookie returns an empty string.

Failure

I haven’t gone back to check the history, but I suspect this IE/Edge bug is at least fifteen years old.

After confirming this bug, it felt strangely familiar, as if I’d hit this landmine before. Then, as I was writing this post, I realized when… Back in 2011, I shared the C# code Fiddler uses for mapping a socket to a process. That code relies on the GetExtendedTcpTable API, which has the same reallocate-then-reinvoke design. Fortunately, I’d fixed the bug a few weeks later in Fiddler, but it looks like I never updated my blog post (sorry about that).

-Eric

PS: Unrelated, but one more pitfall to be aware of: InternetGetCookieExW has a truly bizarre shape, in that the lpdwSize argument is a pointer to a count of wide characters, but if ERROR_INSUFFICIENT_BUFFER is returned, the size argument is set to the count of bytes required.

Standard
bluebadge, browsers, security

Duct Tape and Baling Wire–Cookie Prefixes

Update: Cookie Prefixes are supported by Chrome 49, Opera 36, and Firefox 50. Test page; no status from the Edge team

A new cookie feature called SameSite Cookies has been shipped by Chrome, Firefox and Edge; it addresses slightly different threats.


When I worked on Internet Explorer, we were severely constrained on development resources. While the team made a few major investments for each release (Protected Mode, Loosely-coupled IE, new layout engines, etc), there was a pretty high bar to get any additional feature work in. As a consequence, I very quickly learned to scope down any work I needed done to the bare minimum required to accomplish the job. In many cases, I wouldn’t even propose work if I wasn’t confident that I (a PM) could code it myself.

In many cases, that worked out pretty well; for instance, IE led the way in developing the X-Frame-Options clickjacking protection, not only because we found other approaches to be unworkable (bypassable, compat-breaking, or computationally infeasible) but also because a simple header (internally nicknamed “Don’t Frame Me, Bro”) was the only thing we could afford to build1.

In other cases, aiming for the bare minimum didn’t work out as well. The XDomainRequest object was a tiny bit too simple—for security reasons, we didn’t allow the caller to set the request’s Content-Type header. This proved to be a fatal limitation because it meant that many existing server frameworks (ASP, ASPNET, etc) would need to change in order to be able to properly parse a URLEncoded request body string.

One of the “little features” that lingered on my whiteboard for several years was a proposal called “Magic-Named Cookies.” The feature aimed to resolve one significant security shortcoming of cookies—namely, that a server has no way to know where a given cookie came from. This limitation relates to the fact that the attributes of a cookie (who set it, for what path, with what expiration time, etc) are sent to the client in the Set-Cookie header but these attributes are omitted when the Cookie header is sent back to the server. Coupled with cookies’ loose-scoping rules (where a cookie can be sent to both “parent” and “sub” domains, and cookies sent from a HTTP origin are sent to the HTTPS origin of the same hostname) this leads to a significant security bug, whereby an attacker can perform a “Cookie Fixation” attack by setting a cookie that will later be sent to (and potentially trusted by) a secure origin. These attacks still exist today, although various approaches (e.g. HSTS with includeSubdomains set) are proposed to mitigate it.

RFC2965 had attempted to resolve this but it never got any real adoption because it required a major change in the syntax of the Cookie header sent back to the server, and changing all of the clients and servers proved too high a bar.

My Magic-Named Cookies proposal aimed to address this using the “The simplest thing that could possibly work” approach. We’d reserve a cookie name prefix (I proposed $SEC-) that, if present, would indicate that a cookie had been set (or updated) over a HTTPS connection. The code change to the browser would be extremely simple: When setting or updating a cookie, if the name started with $SEC-, the operation would be aborted if the context wasn’t HTTPS. As a consequence, a server or page could have confidence that any cookie so named had been set by a page sent on a HTTPS connection.

While magic naming is “ugly” (no one likes magic strings), the proposal’s beauty is in its simplicity—it’d be a two line code change for the browser, and wouldn’t add even a single bitfield to the cookie database format. More importantly, web server platforms (ASP, ASPNET, etc) wouldn’t have to change a single line of code. Web Developers and frameworks could opt-in simply by naming their cookies with the prefix—no other code would need to be written. Crucially, the approach degrades gracefully (albeit unsecurely)—legacy clients without support for the restriction would simply ignore it and not enforce the restriction, leaving them no more (or less) safe than they were before.

Unfortunately, this idea never made it off my whiteboard while I was at Microsoft. Over the last few years, I’ve tweeted it at the Chrome team’s Mike West a few times when he mentions some of the other work he’s been doing on cookies, and on Wednesday I was delighted to see that he had whipped up an Internet Draft proposal named Cookie Prefixes. The draft elaborates on the original idea somewhat:

  • changing $SEC- to __SECURE-
  • requiring a __SECURE- cookie to have the secure attribute set
  • adding an __HOST- prefix to allow cookies to inform the server that they are host-locked

In Twitter discussion, some obvious questions arose (“how do I name a cookie to indicate both HTTPS-set and Origin locked?” and “is there a prefix I can use for first-party-only cookies”?) which lead to questions about whether the design is too simple. We could easily accommodate the additional functionality by making the proposal uglier—for instance, by adding a flags field after a prefix:

Set-Cookie: $RESTRICT_ofh_MyName= I+am+origin-locked+first+party+only+and+httponly; secure; httponly

Set-Cookie: $RESTRICT_s_MyName2= I+am+only+settable+by+HTTPS+without+other+restrictions

 

… but some reasonably wonder whether this is too ugly to even consider.

Cookies are an interesting beast—one of the messiest hacks of the early web, they’re too important to nuke from orbit, but too dangerous to leave alone. As such, they’re a wonderful engineering challenge, and I’m very excited to see the Chrome team probing to find improvements without breaking the world.

-Eric Lawrence

1 See Dan Kaminsky’s proposal to understand, given infinite resources, the sort of ClickJacking protection we might have tried building.

Standard