browsers, design, privacy

Chrome Sync

Disclaimer: Hi. I’m an engineer on the Edge browser now, but worked on Chrome Security for a bit over two years. I speak for no one but myself, and I share no internal or confidential information in this post.

Update: The Chrome team announced upcoming changes based on user-feedback.

This weekend, there were a bunch of breathless articles and blogs about a very small change recently made to Chrome. Some of the claims are correct and carefully thought out, but in the swirl of clickbait and confusion, there are a bunch of inaccurate claims as well.

What Changed?

No, Chrome doesn’t “upload your browser history when you check GMail”… unless you tell it to do so.

Yes, Chrome has streamlined the opt-in to the browser’s “Sync” features, such that you no longer need to individually type your username and password when enabling Sync. Whether you consider this “Convenient!” or “Terrible!” is a matter of perception and threat model.

Screenshots

Because many people haven’t seen this change yet, here are some screenshots from Chrome 71.3558.

When you sign into a Google site or service in the browser’s HTML content area, your avatar/profile picture will now appear in the browser UI. If you click on your avatar, a flyout appears offering to enable sync:

Meeple

Similarly, if you use certain browser features that are more valuable with Sync (e.g. creating a bookmark or storing a password), the UI offers to turn on Sync:

OptIn

(Concern: This “Sync as Eric” button appears in the same X,Y location as the “Save Password” button on the preceding flyout/bubble, so you could conceivably click it by accident.)

Bookmark

If you click on one of these options to turn on Sync, a giant flyout appears to tell you what this means:

Get Google smarts

Interestingly/Concerningly, if you click “Settings”, it’s interpreted as “Yes, and show Settings“:

SettingsMeansYes

If you look through the Sync settings (also available by visiting chrome://settings/?search=sync), you’ll get a rich list of controls for what you can choose to sync:

SyncMain

Below that list, you can also find a list of all of the other ways (independent of Sync/Signin) that Chrome can talk back to Google:

OtherSettings

Change Your Mind? Disable Sync

If you change your mind after enabling Sync, or enabled it by accident, it’s easy to change. Click your Avatar and click on the “Syncing to” badge. Then click Turn off. Decide whether or not you want to keep local copies of the data that was sync’d to the server:

TurnOffSync

Note: If you accidentally signed into Chrome and enabled Sync on a device you borrowed from someone else, it’s very important that you check the box to clear local data when you’re done with it, or your information will be left behind on the device.

Fun Stuff for Nerds

If you’re a geek and are interested to better understand how Sync Works, visit the URL chrome://sync-internals/ in Chrome. You can see events in the Sync process and tons of data about what exactly is syncing.

CoolData

Opt-Out

If you don’t want Chrome to Sync, just don’t click buttons that offer to enable it.

Arguably, enabling Sync is now so streamlined that you could conceivably do it by accident (or someone borrowing your PC could do it for you).

If you’re concerned about sign-in to Chrome and want to ensure that you don’t ever activate Sync by accident, you can set a Policy inside HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Google\Chrome:

Regedit

You can set the Windows policy by simply importing this registry script.

After Sync is disabled, buttons asking to turn on Sync disappear, and (interestingly) if you try to sign into Chrome, a warning notice appears:

SyncDisabled

Motivations

Many of the articles that came out this weekend were rife with speculation that this is some new attempt from Google to erode your privacy by vacuuming up even more of your browsing data.

I don’t work for Google and I have no insider knowledge, but I think such attributions are incorrect. I think the correct explanations are much more mundane:

  1. The old UX was just really weird for humans. “Wait, you computer nerds are telling me that I just signed into Google using Google Chrome in the browser content area and now you’re telling me that I need to sign into the Google Chrome Browser using the browser chrome. WTF?!?”
  2. The old UX was dangerously misleading for people who share computers, a worryingly common practice.The new UX makes it at least a bit more clear that if you’re browsing on a borrowed computer, you really should be using a discardable Guest profile.(I think Guest Profiles are one of the coolest little-used features in Chrome).
  3. The new UX streamlines the enabling of Chrome Sync. Chrome Sync provides advantages to both the user and Google.

The Advantage to the User is that their satisfaction with the browser is higher. On average, satisfaction increases when Sync is on because the browser can do more for the user, better protecting them from phishing sites (via the password manager) and sharing their bookmarks, permission settings, etc on every device they use. Users with many devices (me) are especially excited about this benefit– I have enough to remember and configure, and I want my browser to help as much as it can.

The Obvious Advantage to Google is that users who are more satisfied with Chrome are more likely to use Chrome and not switch to some other browser. Chrome is simply stickier because switching to a browser that doesn’t have access to all of the user’s stored passwords, bookmarks, settings, etc, is less appealing.

In a world where browsers are shells around commoditized HTML parsers and script engines (E.g. compare Brave vs. Chrome), sticky features like sync are critical to marketshare. Consider, for instance, Chrome on iOS. It runs atop Apple’s WkWebView and is by-(Apple’s)-design intrinsically inferior in almost every way to the native Safari. Except for one thing… Chrome has my settings and data and Safari doesn’t. So I’ll go out of my way to get Chrome on iOS because, to me, Sync is critical time-saver that Safari can never match, because Safari doesn’t run on Windows and Chrome does. iOS represents a potentially addressable browser market of around a billion devices.

The Possible Unobvious Advantages to Google are what worry the skeptics and fearmongers– what if Google uses my data for something evil? Evil might range from a little evil (showing me ads for Beanie Babies because it sees I’m browsing a lot of Beanie Baby sites) to a lot evil (giving my data to “the Government” or my insurance company or my boss or some other bad person). Similarly, they could start using it recklessly (having my Google Home ask in front of my dinner guests “Hey, Eric, want to continue your search for the best Beanie Baby sales?“).

I personally don’t have significant concerns on this front (I got to see how seriously privacy is taken inside of Google and Chrome), but some people do.

My threat model is not your threat model.

If Evil Google is in your threat model, you can set an option to encrypt your sync data such that only your local Chrome instances can see it:

EncryptSync

If Extra Evil Google is in your threat model, you shouldn’t be using Chrome at all, because obviously Extra Evil Google could just backdoor Chrome before encryption or after decryption.

-Eric

PS: A helpful thread from a Chrome area owner.

Update: The Chrome team announced upcoming changes based on user-feedback.

Standard
browsers, dev

Cookies and Concurrency, Redux

In yesterday’s episode, I shared the root cause of a bug that can cause document.cookie to incorrectly return an empty string if the cookie is over 1kb and the cookie grows in the middle of a DOM document.cookie getter operation.

Unfortunately, that simple bug wasn’t the root cause of the compatibility problem that I was investigating when my code-review uncovered it. The observed compatibility bug was slightly different– in the repro case, only one of the document’s cookies goes missing, and it goes missing even when only one page is setting the cookie.

After the brain-melting exercise of annotating the site’s minified framework libraries (console.log(‘…’) ftw!) via Fiddler’s AutoResponder, I found that the site uses the document.cookie API to save the same cookie (named “ld“) three times in a row, adding some information to the cookie each time. However, the ld cookie mysteriously disappears between 0.4 and 6 milliseconds after it gets set the third time. I painstakingly verified that the cookie wasn’t getting manipulated from any other context when it disappeared.

Hmm…

As I wrote up the investigation notes, I idly noted that due to a trivial typo in the website’s source code, the ld cookie was set first as a Persistent cookie, then (accidentally) as a Session cookie, then as a Persistent cookie.

In re-reading the notes an hour later, again my memory got tickled. Hadn’t I seen something like this before?

Indeed, I had. Just about five years ago, a user reported a similar bug where a HTTP response contained two Set-Cookie calls for the same cookie name and Internet Explorer didn’t store either cookie. I built a reduced test case and reported it to the engineering team.

Pushing Cookies

The root cause of the cookie disappearance relates to the Internet Explorer and Edge “loosely-coupled architecture.”

In IE and Edge, each browser tab process runs its own networking stack, in-process1. For persistent cookies, this poses no problem, because every browser process hits the same WinINET cookie storage area and gets back the latest value of the persistent cookie. In contrast, for session cookies, there’s a challenge. Session cookies are stored in local (per-process) variables in the networking code, but a browser session may include multiple tab processes. A Session cookie set in a tab process needs to be available in all other tab processes in that browser session.

As a consequence, when a tab writes a Session cookie, Edge must send an interprocess communication (IPC) message to every other process in the browser session, telling each to update its internal variables with the new value of the Session cookie. This Cookie Pushing IPC is asynchronous, and if the named cookie were later modified in a process before the IPC announcing the earlier update to the cookie is received, that later update is obliterated.

The Duplicate Set-Cookie header version of this bug got fixed in the Fall 2017 Update (RS3) to Windows 10 and thus my old Set-Cookie test case case no longer reproduces the problem.

Unfortunately, it turns out that the RS3 fix only corrected the behavior of the network stack when it encounters this pattern– if the cookie-setting calls are made via document.cookie, the problem reappears, as in this document.cookie test case.

BadBehavior

Playing with the repro page, you’ll notice that manually pushing “Set HOT as a Session cookie” or “Set as a Persistent cookie” works fine, because your puny human reflexes aren’t faster than the cookie-pushing IPC. But when you push the “Set twice” button that sets the cookie twice in fast succession, the HOT cookie disappears in Edge (and in IE11, if you have more than one tab open).

Until this bug is fixed, avoid using document.cookie to change a persistent cookie to a session cookie.

-Eric

In contrast, in Chrome, all networking occurs in the browser process (or a networking-only process), and if a tab process wants to get the current document.cookie, it must perform an IPC to ask the browser process for the cookie value. We call this “cookie pulling.”

Standard
browsers, dev

ERROR_INSUFFICIENT_BUFFER and Concurrency

Many classic Windows APIs accept a pointer to a byte buffer and a pointer to an integer indicating the size of the buffer. If the buffer is large enough to hold the data returned from the API, the buffer is filled and the API returns S_OK. If the buffer supplied is not large enough to hold all of the data, the API instead returns ERROR_INSUFFICIENT_BUFFER, updating the supplied integer with the length of the buffer required. The client is expected to reallocate a new buffer of the specified size and call the API again with the new buffer and length.

For example, the InternetGetCookieEx function, used to query the WinINET networking stack for cookies for a given URL, is one such API. The GetExtendedTcpTable function, used to map sockets to processes, is another.

The advantage of APIs with this form is that you can call the API with a reasonably-sized stack buffer and avoid the cost of a heap allocation unless the stack buffer happens to be too small.

In the case of Internet Explorer and Edge, the document.cookie DOM API getter’s implementation first calls the InternetGetCookieEx API with a 1024 WCHAR buffer. If the buffer is big enough, the cookie string is then immediately returned to the page.

However, if ERROR_INSUFFICIENT_BUFFER is returned instead (and if the size needed is 10240 characters (MAX_COOKIE_LEN) or fewer), the API will allocate a new buffer on the heap and call the API again. If the API succeeds, the cookie string is returned to the page, otherwise if any error is returned, an empty string is returned to the page.

Wait. Do you see the problem here?

It’s tempting to conclude that the document.cookie API doesn’t need to be thread-safe–JavaScript that touches the DOM runs in one thread, the UI thread. But cookies are a form of data storage that is available across multiple threads and processes. For instance, subdownload network requests for the page’s resources can be manipulating the cookie store in parallel, and if I happen to have multiple tabs or windows open to the same site, they’ll be interacting with the same cookie jar.

So, consider following scenario: The document.cookie implementation calls InternetGetCookieEx but gets back ERROR_INSUFFICIENT_BUFFER with a required size of 1200 bytes. The implementation dutifully allocates a 1200 byte buffer, but before it gets the chance to call InternetGetCookieEx again, an image on the page sets a new 4 byte cookie which WinINET puts in the cookie jar. Now, when InternetGetCookieEx is called again, it again returns ERROR_INSUFFICIENT_BUFFER because the required buffer is now 1204 characters. Because document.cookie isn’t using any sort of loop-until-success, it returns an empty cookie string.

Now, this is all fast native code (C/C++), so surely this sort of thing is just theoretical… it can’t really happen on a fast computer, right?

Around ten years ago, I showed how you can use Meddler to easily generate a lot of web traffic for testing browsers. Meddler is a simple web server that has a simple GUI code editor slapped on the front (most developers would use node.js or Go for such tasks). I quickly threw together a tiny little MeddlerScript which exercises cookies by loading cookie-setting images in a loop and monitoring the document.cookie API to see if it ever returns an empty string.

Boy, does it ever. On my i7 machines, it usually only takes a few seconds to run into the buggy case where document.cookie returns an empty string.

Failure

I haven’t gone back to check the history, but I suspect this IE/Edge bug is at least fifteen years old.

After confirming this bug, it felt strangely familiar, as if I’d hit this landmine before. Then, as I was writing this post, I realized when… Back in 2011, I shared the C# code Fiddler uses for mapping a socket to a process. That code relies on the GetExtendedTcpTable API, which has the same reallocate-then-reinvoke design. Fortunately, I’d fixed the bug a few weeks later in Fiddler, but it looks like I never updated my blog post (sorry about that).

-Eric

PS: Unrelated, but one more pitfall to be aware of: InternetGetCookieExW has a truly bizarre shape, in that the lpdwSize argument is a pointer to a count of wide characters, but if ERROR_INSUFFICIENT_BUFFER is returned, the size argument is set to the count of bytes required.

PPS: As of Windows 10 RS3, Edge (and IE) support 180 cookies per domain to match Chrome, but the network stack will skip setting or sending individual cookies with a value over 5120 bytes.

Standard
browsers

Edge Interop Issues

As we finish up the next release of Windows 10 (Fall 2018), my team is hard at work triaging incoming bugs.

Many such bugs take the form “Edge does the wrong thing for this page. ${Other_Browser} works okay.

This post is designed to be an (ever-growing) index of some of the behavioral deltas that are the root cause of such issues:


Edge doesn’t allow navigation to DATA urls, even when they’d otherwise be converted to file downloads.

Using pushState or replaceState with |undefined| as the URL argument shows “undefined” in the Address box in Edge/IE but not Chrome or Firefox.

IE/Edge strip the Content-Encoding header from a compressed response; Firefox and Chrome leave the header in. For XmlHttpRequest’s getAllResponseHeaders, IE and Firefox maintain the case of HTTP Response header names while Chrome/Edge/Safari do not.

Chrome recognizes that a file with a .JSON extension has the type application/json (and vice versa) while IE/Edge only recognize that when the registry is configured with that mapping.

Chrome includes a hack that works around certificates that do not exactly match the domain on which they are served. Firefox, Edge, and IE do not include this hack, leading to a Certificate Name Mismatch Error when loading:

WWWAddition

Edge does not fully support the URL standard, meaning that URLs of the form http:/example.com (note the missing slash) do not work as expected.

Edge and IE do not allow navigation to HTTP URLs containing a UserInfo component. Other browsers currently (reluctantly) allow this syntax.

Edge RS5 introduces support for Web Authentication specification (in order to support FIDO2 tokens). That specification extends the Credential Management API with new methods, so the navigator.credentials object now exists. However, Edge does not implement the navigator.credentials.preventSilentAccess() method and attempting to call it will cause an exception due to the missing method. (Edge always prevents silent access, so a future implementation of this method will simply fulfill the promise immediately).

When a server returns a HTTP/[301|302|303|307|308] response, Edge/IE are unable to read the response body if the server didn’t include a Content-Length header or Transfer-Encoding: chunked (HTTP/1.1). This turns out to break login to YouTube TV, where Google returns a response body over HTTP/2 (which does not require explicit content lengths thanks to its inherent message framing).

Edge supports most of CSP2 but currently does not support nonces on sourced script elements (only inline script and styles) [Test page]. This limitation significantly complicates deployment of CSP for sites that cannot easily enumerate their source locations in the Content-Security-Policy header (Edge does not support CSP3’s strict-dynamic directive yet either). A broad rule (e.g. script-src https: ) can be used as a workaround but this does increase attack surface.

IE and Edge begin immediately downloading the content of a SCRIPT SRC, not waiting until the SCRIPT element is added to this DOM. This means, for instance, that adding a |crossorigin| attribute to that element after setting its source does not result in an |Origin| header being sent on the request.


(…to be continued…)

-Eric

Standard
browsers

Script-Generated Download Files

As we finish up the next release of Windows 10, my team is hard at work triaging incoming bugs. Here’s a pattern that has come up a few times this month:

Bug: I click download in Edge:

DownloadButtonbut I end up on an error page:

WompWompDataURI

Womp womp.

If you watch the network traffic, you’ll see that no request even hits the network in the failing case. But, if you carefully scroll that ugly error URL to see the middle, the source of the problem appears:

ms-appx-web://microsoft.microsoftedge/assets/errorpages/dnserror.html?ErrorStatus=0x80704006&NetworkStatusSupported=1#data:text/csv;charset=UTF-8, ID,Datetime,Type,Status,Note,From,To,Amount%20(total),Amount%20(fee),Funding%20Source,Destination%0D%0A

The error shows that Edge failed to navigate to a URL with the Data URI scheme.

Ever since we introduced support for DATA URLs a decade ago in Internet Explorer 8, they’ve been throttled with one major limitation: You cannot navigate to these URIs at the top level of the browser. Edge loosened things up so that Data URLs under 4096 characters can be used as the source of IFRAMEs, but the browser will not navigate to a data URL at the top level.

(Yes, this error page could use some love.)

Now, you might remember that last winter, Chrome took a change to forbid top-level navigation to data URIs (due to spoofing concerns), but that restriction contains one important exception: navigations that get turned into downloads (due to their MIME type being one other than something expected to render in the browser) are exempted. So this scenario sorta works in Chrome. (I say “sorta” because the authors of this site failed to specify a meaningful filename on the link, so the file downloads without the all-important .csv extension).

ChromeWorksSorta2

So, does IE/Edge’s restriction on Data URIs mean that webdevs cannot generate downloadable files dynamically in JavaScript in a way that works in all browsers?

No, of course not.

There are many alternative approaches, but one simple approach is to just use a blob URL, like so:

  var text2 = new Blob(["a,b,c,d"], { type: 'text/csv'});
  var down2 = document.createElement("a");
  down2.download = "simple.csv";
  down2.href = window.URL.createObjectURL(text2);
  document.body.appendChild(down2);
  down2.innerText="I have a download attribute. Click me";

When the link is clicked, the CSV file is downloaded with a proper filename.

 

-Eric

Standard
browsers, security, web

CORS and Vary

Yesterday, I started looking a site compatibility bug where a page’s layout is intermittently busted. Popping open the F12 Tools on the failing page, we see that a stylesheet is getting blocked because it lacks a CORS Access-Control-Allow-Origin response header:

NoStylesheetCORS

We see that the client demands the header because the LINK element that references it includes a crossorigin=anonymous directive:

crossorigin="anonymous" href="//s.axs.com/axs/css/90a6f65.css?4.0.1194" type="text/css" />

Aside: It’s not clear why the site is using this directive. CORS is required to use  SubResource Integrity, but this resource does not include an integrity attribute. Perhaps the goal was to save bandwidth by not sending cookies to the “s” (static content) domain?

In any case, the result is that the stylesheet sometimes fails to load as you navigate back and forward.

Looking at the network traffic, we find that the static content domain is trying to follow the best practice Include Vary: Origin when using CORS for access control.

Unfortunately, it’s doing so in a subtly incorrect way, which you can see when diffing two request/response pairs for the stylesheet:

VaryDiff

As you can see in the diff, the Origin token is added only to the response’s Vary directive when the request specifies an Origin header. If the request doesn’t specify an Origin, the server returns a response that lacks the Access-Control-* headers and also omits the Vary: Origin header.

That’s a problem. If the browser has the variant without the Access-Control directives in its cache, it will reuse that variant in response to a subsequent request… regardless of whether or not the subsequent request has an Origin header.

The rule here is simple: If your server makes a decision about what to return based on a what’s in a HTTP header, you need to include that header name in your Vary, even if the request didn’t include that header.

-Eric

PS: This seems to be a pretty common misconfiguration, which is mentioned in the fetch spec:


CORS protocol and HTTP caches

If CORS protocol requirements are more complicated than setting `Access-Control-Allow-Origin` to * or a static origin, `Vary` is to be used.

Vary: Origin

In particular, consider what happens if `Vary` is not used and a server is configured to send `Access-Control-Allow-Origin` for a certain resource only in response to a CORS request. When a user agent receives a response to a non-CORS request for that resource (for example, as the result of a navigation request), the response will lack `Access-Control-Allow-Origin` and the user agent will cache that response. Then, if the user agent subsequently encounters a CORS request for the resource, it will use that cached response from the previous non-CORS request, without `Access-Control-Allow-Origin`.

But if `Vary: Origin` is used in the same scenario described above, it will cause the user agent to fetch a response that includes `Access-Control-Allow-Origin`, rather than using the cached response from the previous non-CORS request that lacks `Access-Control-Allow-Origin`.

However, if `Access-Control-Allow-Origin` is set to * or a static origin for a particular resource, then configure the server to always send `Access-Control-Allow-Origin` in responses for the resource — for non-CORS requests as well as CORS requests — and do not use `Vary`.


 

Standard
browsers

Be skeptical of client-reported MIME Content-Types

Over the 14 years that I’ve been working on browsers and the web platform, I’ve seen a lot of bugs where the client’s configuration causes a problem with a website.

By default, Windows maintains File Extension to Content Type and Content Type to File Extension mappings mappings in the registry. You can find the former mappings in subkeys named for each file extension, e.g. HKEY_CLASSES_ROOT\.ext, and the latter as subkeys under the HKEY_CLASSES_ROOT\MIME\Database\Content Type key:

PDFMapping

These mappings are how Internet Explorer, Edge, and other browsers know that a file delivered as Content-Type: application/pdf should be saved with a .pdf extension, and that a local file named example.html ought to be treated as Content-Type: text/html.

Unfortunately, these mappings are subject to manipulation by locally-installed software, which means you might find that installing Microsoft Excel causes your .CSV file upload to have a Content-Type of application/vnd.ms-excel instead of the text/csv your website was expecting.

Similarly, you might be surprised to discover that some popular file extensions do not have a MIME type registered by default on Windows. Perhaps the most popular of these is files in JavaScript Object Notation format; these generally should have the file extension .json and a MIME type of application/json but Windows treats these as an unknown type by default.

Today, I looked at a site which allows the user to upload a JSON file containing data exported from some other service. The upload process fails in Edge with an error saying that the file must be JSON. Looking at the script in the site, it contains the following:

validateFile = function(file) {
  if (file.type !== "application/json") // BUG BUG BUG
    { alert('That is not a valid json file.'); return; }

This function fails in Edge– the file.type attribute is the empty string because Windows has no mapping between .json and application/json.

This site usually works in Chrome because Chrome has a MIME-type determination system which first checks a fixed list of mappings, then, if no fixed mapping was found, consults the system registry, and finally, if the registry does not specify a MIME type for a given extension, Chrome consults a “fallback” list of mappings (kSecondaryMappings), and .JSON is in that final fallback list. However, even Chrome users would be broken if the file had the wrong extension (e.g. data.jso) or if the user’s registry contained a different mapping (e.g. .json=>”text/json”).

As a consequence, client JavaScript and server-side upload processing logic should be very skeptical of the MIME type contained in the file.type attribute or Content-Type header, as the MIME value reported could easily be incorrect by accident (or malice!).

-Eric Lawrence
PS: End users can workaround the problem with sites that expect particular MIME types for JSON by importing the following Registry Script (save the text as FixJSON.reg and double-click the file):

Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\.json]
"Content Type"="application/json"

[HKEY_CLASSES_ROOT\MIME\Database\Content Type\application/json]
"Extension"=".json"

 

Standard