dev

The Chrome team is embarking on a clever and bold plan to change the recipe for cookies. It’s one of the most consequential changes to the web platform in almost a decade, but with any luck, users won’t notice anything has changed.

But if you’re a web developer, you should start testing your sites and services now to help ensure a smooth transition.

What’s this all about?

As originally designed, cookies were very simple. When a browser made a request to a website, that website could return a tiny piece of text, called a cookie, to the browser. When the web browser subsequently requested any resource from that website, the cookie string would be echoed back to the server that first sent it.

Simple, right?

A bit too simple, as it turns out.

Browser designers have spent the last two decades trying to clear up the mess that this one simple feature causes, and alternatives might never gain adoption.

There are two major classes of problem with the design of cookies: Privacy, and Security.

Privacy

The top privacy problem is that cookies are sent every time a request is made for a resource, even if that request is made from a completely different context. So, if you visit A.example.com, that page might request a tracking pixel from ad.doubleclick.net. This tracking pixel might set a cookie. The tracking pixel’s cookie is called a third party cookie because it was set by a domain unrelated to the page itself.

If you later visit B.textslashplain.com, which also contains a tracking pixel from ad.doubleclick.net, the tracking pixel’s cookie set on your visit to A.example.com is sent to ad.doubleclick.net, and now that tracker knows that you’ve visited both sites. As you browse more and more sites that contain a tracking pixel from the same provider, that provider can build up a very complete profile of the sites you like to visit, and use that information to target ads to you, sell the data to a data aggregation company, etc.

Today, Brave blocks 3rd party cookies by default, while Safari’s ITP feature does something more intricate. Firefox and the new Edge have “Tracking Prevention” features that block 3rd-party cookies from known trackers.

Most browsers offer a setting to turn off ALL third party cookies, and older versions of Internet Explorer used to P3P block cookies that did not promise to abide by reasonable privacy protections. However, almost all users leave 3rd party cookies enabled, and enough sites sent fraudulent P3P declarations that the P3P support was ripped out of the only browser that supported it.

Security

The security problems with cookies are a bit more subtle.

In most cases, after you log in, a site will store your identity in a cookie, such that you don’t have to reenter your password on every page, or retap your security key every time you do anything. An authentication token is stored in a cookie, and each request you make to a site carries that cookie and token.

The problem is that this creates the possibility of a cross site request forgery attack, in which an attacker carefully crafts a request to a website to which you are logged in. When you visit the attacker’s site (say, to read a news article or view an image link posted to your social media feed), the attacker’s page instructs your browser to send its malicious request (“Transfer $1000 from me to @badguy”) to the victim site where you are logged in (e.g. https://bank.example).

Normally, such a request would be ignored or responded to by a demand for credentials, but because your browser is already logged in to bank.example and because your browser made the request, the server receives the cookie containing your authentication token and deems the request legitimate. You’ve been robbed! This class of attack is called the confused deputy attack.

Now, there are myriad ways to protect against this problem, but they all require careful work on the part of web application developers, and a long history of exploits shows that failing to protect against CSRF is a common mistake.

Other Privacy and Security Problems

I’ve described the biggest and most prominent of the security and privacy problems with the design of cookies, but those are just the tip of the iceberg. There are many other problems, including:

  • Because cookies are sent on plaintext HTTP requests, a passive network observer can watch your cookies as they’re sent over the network and correlate requests back to a single client. The NSA famously abused this vulnerability.
  • Cookies are sent to servers which respond with data that might allow Cross Site Leaks or violations of Same Origin Policy. For instance, an attacker might include in their page several guesses as to your identity. If they guess right (e.g. your cookie matches the URL identity), the images loads and now the attacker site knows exactly who you are:
    WordPressCorruptsMarkup
  • Cookies can be trivially stolen in a XSS Attack if the HTTPOnly attribute was not set.
  • Cookies can be trivially leaked if the server forget to set the Secure attribute and the site isn’t on the HSTS preload list.
  • Same Origin Policy blocks reading of cross-origin resources, but this depends on the integrity of the browser sandbox. Attacks like Spectre weaken this guarantee. Ambient authentication (like cross-origin cookies) weakens Cross-Origin-Read-Blocking‘s ability to prevent a compromised renderer from stealing data.

Cookie Design Improvements

Over the years, browsers have introduced more and more features and toggles to help lock down cookies, from the Secure and HTTPOnly attributes to Cookie Prefixes (nee Magic Named Cookies).

Perhaps the most promising improvement is a feature called SameSite cookies, by which cookies can opt-in to being sent only in a first-party context:

Set-Cookie: __HostAuth=F123ABCA; SameSite=Strict; secure; httponly;

This helps protect your site against CSRF attacks and helps mitigate leakage of the user’s identity in a cross-site context.

There’s a nice SameSite cookie explainer (with pictures!).

While broadly supported by browsers, the SameSite directive isn’t getting used everywhere it should be.

Big Changes are Coming!

So the Chrome folks plan to change that.

In Chrome 80 and later, cookies will default to SameSite=Lax. This means that cookies will automatically be sent only in a first party context unless they opt-out by explicitly setting a directive of None:

Set-Cookie: ACookieAvailableCrossSite; SameSite=None; secure; httponly


This change is small in size, and huge in scope. It has huge implications for any site that expects its cookies to be used in a cross-origin context.

 

What’s the Immediate Good?

In one fell swoop, many websites will get more secure. Sites that were previously vulnerable to CSRF and cross-site leak attacks will be protected from attack in the most popular browser.

Privacy improves, because setting and sending of 3rd party cookies are blocked-by-default:

SubframeCookiesBlocked

Who’s on board?

We plan to match this change in default for the new Edge browser (with experiments starting in v80). However, there’s no plan to match this change for Internet Explorer (please stop using it!) or the old Edge (v18 and earlier).

Per Chromium’s Intent-to-Implement announcement, Firefox is looking into matching the change, although they’ve joking-not-joking suggested that they’re going to let Chrome lead the charge (and bear the brunt of the compatibility impact) before turning the feature on by default.

Per the I2I, Safari has not yet weighed in on this change. Safari’s ITP feature already imposes many interesting restrictions on cross-site cookies.

What Can Go Wrong?

If users visit a site that expects its cookies to be available but the cookies are missing, users might get a confusing error message suggesting that they toggle a setting that won’t help:

teamserror

Or, the site might just redirect between its identity provider and itself forever.

These sorts of problems happen on sites that use Federated Identity providers that depend on accessing cookies from 3rd-party subframes:

federatedid-1

To fix this, the identity provider site will either need to set SameSite=None on its cookies, or will need to use a browser storage feature (e.g. localStorage) that is not impacted by this change. Please note that other browser features do impact the availability of DOM Storage, so it’s not a silver bullet.

Rollout Plan

The Chrome team has set an ambitious timeline which calls for turning this feature on-by-default for Chrome 80, slated for stable release on February 4th, 2020. Chrome’s rollout plan includes enabling the new default on an experimental basis in Chrome 79 pre-release [Beta/Dev/Stable] channels for machines which are not externally managed — attempting to defer walking into the compatibility minefield of enterprise intranets. Presently, this exclusion does not apply to machines that are domain-joined via AAD due to a limitation in Chromium.

The Chrome team also announced two enterprise policies for Chrome 80 that will allow admins to opt-out of the new default entirely, or to opt-out only specific sites. Edge expects to offer these same policies.

Developer Tooling

Developers who wish to enable the SameSite-by-Default feature locally for testing purposes can do so by visiting chrome://flags and searching for SameSite:

flags

Set the SameSite by default cookies feature to Enabled and restart the browser.

You can view the cookies used by the current page using the Application tab of the Developer Tools; the column at the far right shows the declared SameSite attribute:

F12Tab

The Chrome team have enabled logging in the Developer Tools to notify web developers that cookie behavior is changing. Visit chrome://flags/cookie-deprecation-messages to ensure that the warnings are enabled:

CookieMessages

If you then explore my test page, you can see the notices from the tools:

DebugInfo

The Cookies subtab of the Network tab picked up a new checkbox “show filtered out request cookies” which allows you to see (in yellow) which cookies were not sent for the selected request due to SameSite rules:

CookiesTab

Cookies restricted by SameSite rules are also logged in NetLog captures (issue 1005217).

Problems and Accommodations

While the vast majority of cookie scenarios will continue to work as expected, compatibility breaks are inevitable. Unfortunately, some of these breaks might not be trivially fixed by adding the SameSite=None attribute.

For instance, older versions of Safari treated SameSite=None as SameSite=Strict, which means that servers must avoid sending the None token to Safari 12.

The .NET Framework’s cookie writer used to simply omit the SameSite attribute when the SameSiteMode was “None.” Changing this will require the affected sites to update their framework to a version with the patch.

Early in our investigations, we found another problem related to how SameSite impacts cookies sent while navigating.

Specifically, over a million sites first set an anti-CSRF cookie on themselves, then redirect to a federated Login provider, then the Login provider POSTs the login information back to the site. That initial anti-CSRF cookie is only meant to be used in a first party context. Crucially, however, SameSite cookies are not sent on navigations if the navigations use the HTTP POST verb. Making the anti-CSRF cookies SameSite=Lax by default breaks this scenario and thus breaks tons of websites.

AntiCSRF

The Two Minute Mitigation

Demanding these security cookies be set to SameSite=None would be both onerous (many more sites would need to change) and misleading (because these cookies are really only meant to go to a 1st party context).

To address this breakage, the new default was adjusted to allow a SameSite-Lax-by-Default cookie to be sent on a subsequent POST requests for two minutes, significantly reducing the breakage without giving up all of the security benefit of the change.

Note that the 2 minute mitigation might not be enough for login scenarios that take longer than 2 minutes. For instance, consider the case where the login flow is happening in a background tab, or you have to fetch your Security Key, or a child must get a parent’s permission, etc. Sites that wish to handle scenarios like this will need to store a copy of their anti-CSRF token elsewhere (e.g. sessionStorage seems appropriate).

The Chrome team plans to eventually remove the 2 minute mitigation entirely.

Compat Landmine: document.cookie

In Firefox and Safari, the document.cookie DOM property matches the Cookie header, including omission of cookies that were restricted by SameSite navigation rules.

In contrast, in Chrome and Edge, SameSite cookies that are omitted from the Cookie header are still included in the document.cookie collection following a cross-origin navigation. I’ve been convinced that this actually makes more sense, although the reasoning is subtle [issue].

To the extent possible, you should set your cookies with the httponly attribute, so that they’re not available to JavaScript and this compatibility difference is irrelevant.

What’s Next?

Assuming the rollout happens on schedule, users will get more security and more privacy. But that’s just the start.

The next step is to combat the “non-secure-cookies-are-trackable” attack mentioned previously. To prevent non-secure cross-site cookies being used by network observers to follow users around the web, SameSite=None cookies will be blocked if set without the Secure attribute. Chrome’s timeline for enabling this change by default seems squishier, but ChromeStatus claims it is also slated for Chrome 80.

After that, the next step is to combat the inevitable abuse by trackers.

Because trackers can simply opt their own cookies out of restrictions by setting SameSite=None, trackers will do so. But this isn’t is bad as you think– by forcing sites to explicitly declare each cookie for which cross-site use is intended, browsers can then focus extra love and attention around such cookies.

If we peek at Chrome’s flags page today, we see an interesting hint about the Chrome team’s plans:

SSNoneSwitch

Enabling that option enables EnableRemovingAllThirdPartyCookies which adds a new Remove Third-Party Cookies button to the All cookies and site data page. When clicked, it pops the following dialog:Delete3rdPartyButton

The HandleRemoveThirdParty() function invoked by the Clear button clears not only the cookies from those domains, but also all of the site data for the sites on which those cookies existed. This provides a strong disincentive for sites to opt-in to SameSite=None cookies unless they really need to.

(Disclaimer: Chrome might never launch the “Clear third-party cookies” button, but it’s in the code today).

You can expect that browser designers will soon dream up new and interesting remediations for cookies marked for access across multiple sites. For instance, browsers could limit the lifetime of those cookies to days or hours, or even a single browser session (tricky).

We live in interesting times!

Update: Chrome pushed back experimenting with this feature from Chrome 78 to Chrome 79.

Update: The Chromium team have published a post about the new SameSite cookie default on their blog, including a list of incompatible legacy browsers that refuse to accept cookies that specify SameSite=None, and guidance on their rollout timeline.

-Eric

PS: If you’re a Fiddler user, you can use this script to easily visualize which cookies are being set with which SameSite values.

Cookie-dough hero photo by Pam Menegakis on Unsplash.

Background

Typically, if you want your website to send a document to a client application, you simply send the file as a download. Your server indicates that a file should be treated as a download in one of a few simple ways:

  • Specifying a nonwebby type in the Content-Type response header.
  • Sending a Content-Disposition: attachment; filename=whatever.ext response header.
  • Setting a download attribute on the hyperlink pointing to the file.

These approaches are well-supported across browsers (via headers for decades, via the download attribute anywhere but IE since 2016).

The Trouble with Plain Downloads

However, there’s a downside to traditional downloads — unless the file itself contains the URL from which the download originated, the client application will not typically know where the file originated, which can be a problem for:

  • Security – “Do I trust the source of this file?
  • Functionality – “If the user makes a change to this file, to where should I save changes back?“, and
  • Performance – “If the user already had a copy of this 60mb slide deck, maybe skip downloading it again over our expensive trans-Pacific link?

Maybe AppProtocols?

Rather than sending a file download, a solution developer might instead just invoke a target application using an App Protocol. For instance, the Microsoft Office clients might support a syntax like:

ms-word:ofe|u|https://example.com/docx.docx

…which directs Microsoft Word to download the document from example.com.

However, the AppProtocol approach has a shortcoming– if the user doesn’t happen to have Microsoft Word installed, the protocol handler will fail to launch and either nothing will happen or the user may get a potentially confusing error message. That brokenness will occur even if they happen to have another client (e.g. WordPad) that could handle the document.

DirectInvoke

To address these shortcomings, we need a way to instruct the browser: “Download this file, unless the client’s handler application would prefer to just get its URL.”

While a poorly-documented precursor technology existed as early as the 1990s, Windows 8 reintroduced this feature as DirectInvoke. When a client application registers itself indicating that it supports receiving URLs rather than local filenames, and when the server indicates that it would like to DirectInvoke the application using the X-MS-InvokeApp response header:

DirectInvoke

…then the download stream is aborted and the user is instead presented with a confirmation prompt:

UIPrompt

If the user accepts the prompt, the handler application is launched, passing the URL to the web content.

Now, for certain types, the server doesn’t even need to ask for DirectInvoke behavior via the X-MS-InvokeApp header. The FTA_AlwaysUseDirectInvoke bit can be set in the type’s EditFlags registry value. The bit is documented on MSDN as: 

FTA_AlwaysUseDirectInvoke 0x00400000
Introduced in Windows 8. Ensures that the verbs for the file type are invoked with a URL instead of a downloaded version of the file. Use this flag only if you’ve registered the file type’s verb to support DirectInvoke through the SupportedProtocols or UseUrl registration.

Microsoft’s ClickOnce deployment technology makes use of the FTA_AlwaysUseDirectInvoke flag.

A sample registry script for a type that should always be DirectInvoke’d might look like this:

To test it, first set up the registry, install a handler to C:\Windows, and then click this example link

TraditionalVsDI.png

Caveats

In order for this architecture to work reliably, you need to ensure a few things.

App Should Handle Traditional Files

First, your application needs to have some reasonable experience if the content is handled as a traditional download, as it would be using Chrome or Firefox, or on a non-Windows operating system.

By way of example, it’s usually possible to construct a ClickOnce manifest that works correctly after download. Similarly, Office applications work fine with regular files, although the user must take care to reupload the files after making any edits.

App Should Avoid Depending On Browser State

If your download flow requires a cookie, the client application will not have access to that cookie and the download will fail. The client application probably will not be able to prompt the user to login to otherwise retrieve the file.

If your download flow requires HTTP Authentication or HTTPS Client Certificate Authentication, the client application might work (if it supports NTLM/Negotiate) or it might not (e.g. if the server requires Digest Auth and the client cannot show a credential prompt.

App Should Ensure URL Support

Many client applications have limits in the sorts of URLs that they can support. For instance, the latest version of Microsoft Excel cannot handle a URL longer than 260 characters. If a .xlsx download from SharePoint site attempts to DirectInvoke, Excel will launch and complain that it cannot retrieve the file.

App Should Ensure Network Protocol Support

Similarly, if the client app registers for DirectInvoke of HTTPS URLs, you should ensure that it supports the same protocols as the browser. If a server requires a protocol version (e.g. TLS/1.2) that the client hasn’t yet enabled (say it only enables TLS/1.0), then the download will fail.

Server Must Not Send |Content-Disposition: attachment|

As noted in the documentation, a Content-Disposition: attachment response header takes precedence over DirectInvoke behavior. If a server specifies attachment, DirectInvoke will not be used.

Note: If you wish to use a Content-Disposition header to name the file, you can do so using Content-Disposition: inline; filename=”fuzzle.fuzzle”

Conclusion

As you can see, there’s quite a long list of caveats around using the DirectInvoke WebToApp communication scheme, but it’s still a useful option for some scenarios.

In future posts, I’ll continue to explore some other alternatives for Web-to-App communication.

-Eric

Note: The current version of Edge 79 does not yet support FTA_AlwaysUseDirectInvoke. We expect to fix this in the future.

Note: I have a few test cases.

 

As we’ve been working to replatform the new Microsoft Edge browser atop Chromium, one interesting outcome has been early exposure to a lot more bugs in Chromium. Rapidly root-causing these regressions (bugs in scenarios that used to work correctly) has been a high-priority activity to help ensure Edge users have a good experience with our browser.

Stabilization via Channels

Edge’s code stabilizes as it flows through release channels, from the developer’s-only ToT/HEAD (Tip-of-tree, the latest commit in the source repository) to the Canary Channel build (updated daily) to the Dev Channel updated weekly, to the Beta Channel (updated a few times over its six week lifetime) to our eventual, as-yet-unreleased Stable Channel.

Until recently, Microsoft Edge was only available in Canary and Dev channels, which meant that any bugs landed in Chromium would almost immediately impact almost all users of Edge. Even as we added a Beta channel, we still found users reporting issues that “reproduce in all Edge builds, but not in Chrome.

As it turns out, most of the “not in Chrome” comparisons turn out to mean that the problems are “not repro in Chrome Stable.” And that’s because the either the regressions simply haven’t made it to Stable yet, or because the regressions are hidden behind Feature Flags that are not enabled for Chrome’s stable channel.

A common example of this is LayoutNG, a major update to the layout engine used in Chromium. This redesigned engine is more flexible than its predecessor and allows the layout engineers to more easily add the latest layout features to the browser as new standards are finalized. Unfortunately, changing any major component of the browser is almost certain to lead to regressions, especially in corner cases. Google had enabled LayoutNG by default in the code for Chrome 76 (and Edge picked up the change), but then subsequently used the Feature Flag to disable LayoutNG for the stable channel three days before Chrome 76 shipped. As a result, the new LayoutNG engine is on-by-default for Chrome Beta, Dev, and Canary and Edge Beta, Dev, and Canary.

The key difference is that Edge doesn’t yet have a public stable channel to which any bug-impacted users can retreat. Therefore, reproducing, isolating, and fixing regressions as quickly as possible is important for Edge engineers.

Isolating Regressions

When we receive a report of a bug that reproduces in Microsoft Edge, we follow a set of steps for figuring out what’s going on.

Check Latest Canary

The first step is checking whether it reproduces in our Edge Canary builds.

If not, then it’s likely the bug was already found and fixed. We can either tell our users to sit tight and wait for the next update, or we can search on CRBug.com to figure out when exactly the fix went in and inform the user specifically when a fix will reach them.

Check Upstream

If the problem still reproduces in the latest Edge Canary build, we next try to reproduce the problem in the equivalent build of Chrome or plain-vanilla Chromium.

If Not Repro in Chromium?

If the problem doesn’t not reproduce in Chrome, this implies that the problem was caused by Microsoft’s modifications to the code in our internal branches. Alternatively, it might also be the case that the problem is hidden in upstream Chrome behind an experimental flag, so sometimes we must go spelunking into the browser’s Feature Flag configuration by visiting:

chrome://version/?show-variations-cmd 

The Command-Line Variations section of that page reveals the names of the experiments that are enabled/disabled. Launching the browser with a modified version of the command line enables launching Chrome in a different configuration2.

If the issue really is unique to Edge, we can use git blame and similar techniques on our code to see where we might have caused the problem.

If Repro in Chromium?

If the problem does reproduce in Chrome or Chromium, that’s strong evidence that we’ve inherited the problem from upstream.

Sanity Check: Does it Repro in Firefox?

If the problem isn’t a crashing bug or some other obviously incorrect behavior, I will first check the site’s behavior in the latest Firefox nightly build, on the off chance that the browsers are behaving correctly and the site’s markup or JavaScript is actually incorrect.

Try tweaking common Flags

Depending on the area where the problem occurs, a quick next step is to try toggling Feature Flags that seem relevant on the chrome://flags page. For instance, if the problem is in layout, try setting chrome://flags/#enable-layout-ng to Disabled. If the problem seems to be related to the network, try toggling chrome://flags/#network-service-in-process, and so on.

Understanding whether the problem can be impacted by flags enables us to more quickly find its root cause, and provides us an option to quickly mitigate the problem for our users (by adjusting the flag remotely from our experimental configuration server).

Bisecting Regressions

The gold standard for root-causing regressions is finding the specific commit (aka “change list” aka “pull request” aka “patch”) that introduced the problem. When we determine which commit caused a problem, we not only know exactly what builds the problem affects, we also know what engineer committed the breaking change, and we know what exactly code was changed– often we can quickly spot the bug within the changed files.

Fortunately, Chromium engineers trying to find regressing commits have a super power, known as bisect-builds.py. Using this script is simple: You specify the known-bad Chromium version and a guesstimate of the last known good version. The script then automates the binary search of the builds between the two positions, requiring the user to grade each trial as “Good” or “Bad”, until the introduction of the regression is isolated.

The simplicity of this tool belies its magic— or, at least, the power of the log2 math that underlies it. OmahaProxy informs me that the current Chrome tree contains 692,278 commits. log2(692278) is 19.4, which means that we should be able to isolate any regressing change in history in just 20 trials, taking a few minutes at most1. And it’s rare that you’d want to even try to bisect all of history– between stable and ToT we see ~27250 commits, so we should be able to find any regression within such a range in just 15 trials.

On CrBug.com, regressions awaiting a bisect are tagged with the label Needs-Bisect, and a small set of engineers try to burn down the backlog every day. But running bisects is so easy that it’s usually best to just do it yourself, and Edge engineers are doing so constantly.

One advantage available to Googlers but not the broader Chromium community is the ability to do what’s called a “Per-Revision Bisect.” Inside Google, Chrome builds a new build for every single change (so they can unambiguously flag any change that impacts performance), but not all of these builds are public. Publicly, Google provides the Chromium Continuous Build Archive, an archive of builds that are built approximately every one to fifteen commits. That means that when we do bisects, we often don’t get back a single commit, but instead a regression range. We then must look at all of the commits within that range to figure out which was the culprit. In my experience, this is rarely a problem– most commits in the range obviously have nothing to do with the problem (“Change the icon for some far off feature”), so there are rarely more than one or two obvious suspects.

The Edge team does not currently expose our Edge build archives for bisects, but fortunately almost all of our web platform work is contributed upstream, so bisecting against Chromium is almost always effective.

Recent Example

Last Thursday, we started to get reports of a “missing certificate” problem in Microsoft Edge, whereby the browser wasn’t showing the expected Lock icon for HTTPS pages that didn’t contain any mixed content:

The certificate is missing

While the lock was missing for some users, it was present for others. After also reproducing the issue in Chrome itself, I filed a bug upstream and began investigating.

Back when I worked on the Chrome Security team, we saw a bunch of bugs that manifested like this one that were caused by refactoring in Chrome’s navigation code. Users could hit these bugs in cases where they navigated back/forward rapidly or performed an undo tab close operation, or sites used the HTML5 History API. In all of these cases, the high-level issue is that the page’s certificate is missing in the security_state: either the ssl_status on the NavigationHandle is missing or it contains the wrong information.

This issue, however, didn’t seem to involve navigations, but instead was hit as the site was loaded and thus it called to mind a more recent regression from back in March, where sites that used AppCache were missing their lock icon. That issue involved a major refactoring to use the new Network Service.

One fact that immediately jumped out at me about the sites first reported to hit this new problem is that they all use ServiceWorker (e.g. Hotmail, Gmail, MS Teams, Twitter). Like AppCache, ServiceWorker allows the browser to avoid hitting the network in response to a fetch. As with AppCache, that characteristic means that the the browser must somehow have the “original certificate” for that response from the network so it can set that certificate in the security_state when it’s needed.

But where does that certificate live?

Chromium stores the certificate for a given HTTPS response after the end of the cache entry, so it should be available whenever the cached resource is used3. A quick disk search revealed that ServiceWorker stores its scripts inside the folder:

%localappdata%\Microsoft\Edge SxS\User Data\Default\Service Worker\ScriptCache

Comparing the contents of the cache file on a “good” and a “bad” PC, we see that the certificate information is missing in the cache file for a machine that reproduces the problem:

The serialized certificate chain is present in the “Good” case

So, why is that certificate missing? I didn’t know.

I performed a bisect three times, and each time I ended up with the same range of a dozen commits, only one of which had anything to do with anything to do with caching, and that commit was for AppCache, not ServiceWorker.

More damning for my bisect suspect was the fact that this suspect commit landed (in 78.0.3890) the day after the build (3889) upon which the reproducing Edge build was based. I spent a bunch of time figuring out whether this could be the off-by-one issue in Edge build numbering before convincing myself that no, it couldn’t be: build number misalignment just means that Edge 3889 might not contain everything that’s in Chrome 3889.

Unless an Edge Engineer had cherry-picked the regressing commit into our 3889 (unlikely), the suspect couldn’t be the culprit.

Edge 3889 doesn’t include all of the commits in Chromium 3889.

I posted my research into the bug at 10:39 PM on Friday and forty minutes later a Chrome engineer casually revealed something I didn’t realize: Chrome uses two different codepaths for fetching a ServiceWorker script– a “new_script_loader” and an “updated_script_loader.”

And instantly, everything fell into place. The reason the repro wasn’t perfectly reliable (for both users and my bisect attempts) was that it only happened after a ServiceWorker script updates.

  • If the ServiceWorker in the user’s cache is missing, it is downloaded by the new_script_loader and the certificate is stored.
  • If the ServiceWorker script is present and is unchanged on the server, the lock shows just fine.
  • But if the ServiceWorker in the user’s cache is present but outdated, the updated_script_loader downloads the new script… and omits the certificate chain. The lock icon disappears until the user clears their cache or performs a hard (CTRL+F5) refresh, at which point the lock remains until the next script update.

With this new information in hand, building a reliable reduced repro case was easy– I just ripped out the guts of one of my existing PWAs and configured it so that it updated itself every five seconds. That way, on nearly every load, the cached ServiceWorker would be deemed outdated and the script redownloaded.

With this repro, we can kick off our bisect thusly:

python tools/bisect-builds.py -a win -g 681094 -b 690908 --verify-range --use-local-cache -- --no-first-run --user-data-dir=/temp https://webdbg.com/apps/alwaysoutdated/

… and grade each build based on whether the lock disappears on refresh:

Grading each build based on whether the lock disappears

Within a few minutes, we’ve identified the regression range:

A culprit is found

In this case, the regression range contains just one commit— one that turns on the new ServiceWorker update check code. This confirms the Chromium engineer’s theory and that this problem is almost identical to the prior AppCache bug. In both cases, the problem is that the download request passed kURLLoadOptionNone and that prevented the certificate from being stored in the HttpResponseInfo serialized to the cache file. Changing the flag to kURLLoadOptionSendSSLInfoWithResponse results in the retrieval and storage of the ssl_info, including the certificate.

The fix was quick and straightforward; it will be available in Chrome 78.0.3902 and the following build of Edge based on that Chromium version. Notably, because the bug is caused by failure to put data in a cache file, the lock will remain missing even in later builds until either the ServiceWorker script updates again, or you hard refresh the page once.

-Eric

1 By way of comparison, when I last bisected an issue in Internet Explorer, circa 2012, it was an extraordinarily painful two-day affair.

2 You can use the command line arguments in the variations page (starting at --force-fieldtrials=) to force the bisect builds to use the same variations.

Chromium also has a bisect-variations script which you can use to help narrow down which of the dozens of active experiments is causing a problem.

If all else fails, you can also reset Chrome’s field trial configuration using chrome.exe --reset-variation-state to see if the repro disappears.

3 Aside: Back in the days of Internet Explorer, WinINET had no way to preserve the certificate, so it always bypassed the cache for the first request to a HTTPS server so that the browser process would have a certificate available for its subsequent display needs.


Note: This post is part of a series about Web-to-App Communication techniques.

Just over eight years ago, I wrote my last blog post about App Protocols, a class of URL schemes that typically1 open another program on your computer instead of returning data to the web browser. 

App Protocols2 are both simple and powerful, allowing client app developers to easily enable the invocation of their apps from a website. For instance, ms-screenclip is a simple app protocol built into Windows 10 that kicks off the process of taking a screenshot:

    ms-screenclip:?delayInSeconds=2

When the user invokes this url, the handler waits two seconds, then launches its UI to collect a screenshot. Notably, App Protocols are fire-and-forgetmeaning that the handler has no direct way to return data back to the browser that invoked the protocol.

The power and simplicity of App Protocols comes at a cost. They are the easiest route out of browser sandboxes and are thus terrifying, especially because this exploit vector is stable and available in every browser from legacy IE to the very latest versions of Chrome/Firefox/Edge/Safari.

What’s the Security Risk?

A number of issues make App Protocols especially risky from a security point-of-view.

Careless App Implementation

The primary security problem is that most App Protocols were designed to address a particular scenario (e.g. a “Meet Now” page on a videoconferencing vendor’s website should launch the videoconferencing client) and they were not designed with the expectation that the app could be exposed to potentially dangerous data from the web at large.

We’ve seen apps where the app will silently reconfigure itself (e.g. sending your outbound mail to a different server) based on parameters in the URL it receives. We’ve seen apps where the app will immediately create or delete files without first confirming the irreversible operation with the user. We’ve seen apps that assumed they’d never get more than 255 characters in their URLs and had buffer-overflows leading to Remote Code Execution when that limit was exceeded. The list goes on and on.

Poor API Contract

In most cases3, App Protocols are implemented as a simple mapping between the protocol scheme (e.g. “alert”) and a shell command, e.g. 

AlertProtocol

When the protocol is invoked by the browser, it simply bundles up the URL and passes it on the command line to the target application. The app doesn’t get any information about the caller (e.g. “What browser or app invoked this?“, “What origin invoked this?“, etc) and thus it must make any decisions solely on the basis of the received URL.

Until recently, there was an even bigger problem here, related to the encoding of the URL. Browsers, including Chrome, Edge, and IE, were willing to include bare spaces and quotation marks in the URL argument, meaning that an app could launch with a command line like:

alert.exe "alert:This is an Evil URL" --DoSomethingDangerous --Ignore=This"

The app’s code saw the –DoSomethingDangerous “argument”, failed to recognize it as a part of the URL, and invoked dangerous functionality. This attack led to remote code execution bugs in App Protocol handlers many times over the years. 

Chrome began %-escaping spaces and quotation marks8 back in Chrome 64, and Edge 18 followed suit in Windows 10 RS5.

You can see how your browser behaves using the links on this test page.

Future Opportunity: A richer API contract that allows an App Protocol handler to determine how specifically it was invoked would allow it to better protect itself from unexpected callers. Moving the App Protocol URL data from the command line to somewhere else (e.g. stdin) might help reduce the possibility of parsing errors.

Sandbox

The application that handles the protocol typically runs outside of the browser’s sandbox. This means that a security vulnerability in the app can be exploited to steal or corrupt any data the user can access, install malware, etc. If the browser is running Elevated (at Administrator), any App Protocol handlers it invokes are launched Elevated; this is part of UAC’s design.

Because most apps are written in native code, the result is that most protocol handlers end up in the DOOM! portion of this diagram:

RuleOfTwo

Prompting

In most cases, the only4 thing standing between the user and disaster is a permission prompt.

In Internet Explorer, the prompt looked like this:

IEPermission
As you can see, the dialog provides a bunch of context about what’s happening, including the raw URL, the name of the handler, and a remark that allowing the launch may harm the computer.

Such information is lacking in more modern browsers, including Firefox:

FirefoxPermission

…and Edge/Chrome:

ChromePermission

Browser UI designers (reasonably) argue that the vast majority of users are poorly equipped to make trust decisions, even with the information in the IE prompt, and so modern UI has been greatly simplified5

Unfortunately, lost to evolution is the crucial indication to the user that they’re even making a trust decision at all.

Eliminating Prompts

Making matters more dangerous, everyone hates security prompts that interrupt non-malicious scenarios. A common user request can be summarized as “Prompt me only when something bad is going to happen. In fact, in those cases, don’t even prompt me, just protect me.

Unfortunately, the browser cannot know whether a given App Protocol invocation is good or evil, so it delegates control of prompting in two ways:

In Internet Explorer and Edge (version<=18), the browser respects a per-protocol WarnOnOpen boolean in the registry, such that the App itself may tell the browser: “No worries, let anyone launch me, no prompt needed.

In Firefox, Chrome, and Edge (version >= 76), the browser instead allows the user to suppress further prompts with a “Always open these types of links” checkbox. (UX nit: This text is wrong and should instead say “Always open this type of link” or “Always open links of this type”)

If the user selects this option, the protocol will silently launch in the future without the browser first asking permission.

However, Edge/Chrome version 77.0.3864 removed the “Always open these types of links in the associated app” checkbox. The stated reason is found in Chrome issue #982341:

No obvious way to undo “Always open these types of links” decision for External Protocols.

We realized in a conversation around issue 951540 that we don’t have settings UI
that allows users to reconsider decisions they’ve made around external protocol
support. Until we work that out, and make longer-term decisions about the
permission model around the feature generally, we should stop making the problem
worse by removing that checkbox from the UI.

A user who had ticked the “Always open” box has no way to later review that decision6, and no obvious way to reverse it. Almost no one figured out that using the “Clear Browsing Data > Cookies and other site data” dialog box option directs Chrome to delete all previous “Always open” decisions for the user’s profile. 

Particularly confusing is that the “Always open” decision wasn’t made on a per-site basis– it applies to every site visited by the user in that browser profile.

UpdateAn Enterprise policy for v79+ allows administrators to restore the checkbox.

Future Opportunity: Much of the risk inherent in open-without-prompting behavior comes from the site that any random site (http://evil.example.com) can abuse ambient permission to launch the protocol handler. If browsers changed the option to “Always allow this site to open this protocol”, the risk would be significantly reduced, and a user could reasonably safely allow, e.g. https://teams.microsoft.com to open the msteams protocol without a prompt.

Alternatively, perhaps the Registry-based provisioning of a protocol handler should explicitly list the sites allowed to launch the protocol, akin to the SiteLock protection for legacy ActiveX controls.

For some schemes7 , Chrome will not even show a prompt because the protocol is included on a built-in allow or deny list.

Some security folks have argued that browsers should not provide any mechanism for skipping the permission prompt. Unfortunately, there’s evidence to suggest that such a firm stance might result in vendors avoiding the prompt by choosing even riskier architectures for Web-to-App communication. More on this in a future post.

Zero-Day Defense

Even when a zero day vulnerability in an App Protocol handler is getting exploited in the wild (e.g. this one), browsers have few defenses available to protect users. 

Unlike, say, file downloads, where the browser has multiple mechanisms to protect users against newly-discovered threats (e.g. file type policies and SmartScreen/SafeBrowsing), browsers do not presently have rapid update mechanisms that can block access to known vulnerable App Protocol handlers.

Future Opportunity: Use SafeBrowsing/SmartScreen or a file-type-policies style Component Update to supply the client with a list of known-vulnerable protocol handlers. If a page attempts to invoke such a protocol, either block it entirely or strongly caution the user.

To improve the experience even further, the blocklist could contain version information such that blocking/additional warnings would only be shown if the version of the handler app is earlier than the version number of the app containing the fix. 

Antivirus programs typically do monitor all calls to ShellExecute and could conceivably protect against malicious invocation of app protocol handlers, but I am not aware of any having ever done so.

Privacy Concerns Prevent Protocol Detection

One of the most common challenges for developers who want to use App Protocols for Web-to-App communication is that the web platform does not expose the list of available protocol handlers to JavaScript. This is primarily a privacy consideration: exposing the list of protocol handlers to the web would expose a significant amount of fingerprintable entropy and might even reveal things about the user’s interests and beliefs (e.g. a ConservativeNews App or a LGBTQ App might expose a protocol handler for app-to-app communication).

Internet Explorer and Edge <= 18 supply a non-standard JavaScript function msLaunchUri that allows a web page to detect that a user didn’t have a protocol handler installed, but this function is not available in other browsers.

UX When a Protocol Isn’t Installed

Browser behavior varies if the user attempts to invoke a link with a scheme for which no protocol handler is registered.

Firefox shows an error page: FirefoxNotInstalled

On Windows 8 and later, IE and Edge<=18 show a prompt that offers to take the user to the Microsoft store to search for a protocol handler for the target scheme:

Win10NotInstalled

Unfortunately, this search is rarely fruitful because most apps are not available in the Microsoft Store.

Interestingly, Chrome and Edge76+ show nothing at all when attempting to invoke a link for which no protocol handler is installed. Surprisingly, there’s no notice even in the Developer Tools console; a particularly thorough debugger will only see a “(canceled)” request in the DevTool’s Network tab.

Upcoming change – Require HTTPS to Invoke

Chrome is looking at requiring that a page be served over HTTPS in order for it to invoke an application protocol.

 

In future posts, I’ll explore some other alternatives for Web-to-App communication.

-Eric


Notes

1 In some browsers, it’s possible to register web-based handlers for “AppProtocols” (e.g. maps: and mailto: might go to Google Maps and GMail respectively), but this is relatively uncommon.

2 Within Chromium, App Protocols are called “External Protocols.”

3 There are other ways to handle protocols, including COM and the Windows 10 App Model’s URI Activation mechanism but these are uncommon.

4 As an anti-abuse mechanism, the browser may require a user-gesture (e.g. a mouse click) before attempting to launch an App Protocol, and may throttle invocations to avoid spamming the user with an infinite stream of prompts.

5 Chrome’s prompt used to look much like IE’s.

6 Short of opening the Preferences for the profile in Notepad or another text editor. E.g. after choosing “Always open” for Microsoft Teams and Skype for Business, the JSON file %localappdata%\Microsoft\Edge SxS\user Data\default\preferences contains

“protocol_handler”:{“excluded_schemes”:{“msteams”:false, “ms-sfb”: false}}

To see the list in IE/Edge<=18, you can run a registry query to find protocols with WarnOnOpen set to 0:

reg query "HKCU\SOFTWARE\Microsoft\Internet Explorer\ProtocolExecute" /s
reg query "HKLM\SOFTWARE\Microsoft\Internet Explorer\ProtocolExecute" /s


7 Hardcoded schemes:

kDeniedSchemes[] = {“afp”,”data”,”disk”,”disks”,”file”,”hcp”,”ie.http”,”javascript”,”ms-help”,”nntp”,”res”,”shell”,”vbscript”,”view-source”,”vnd.ms.radio”}
kAllowedSchemes[] = {“mailto”, “news”, “snews”};

The EscapeExternalHandlerValue function:

// Escapes characters in text suitable for use as an external protocol handler command. // We %XX everything except alphanumerics and -_.!~*'() and the restricted // characters (;/?:@&=+$,#[]) and a valid percent escape sequence (%XX). EscapeExternalHandlerValue()

 

This is an introduction/summary post which will link to individual articles about browser mechanisms for communicating directly between web content and native apps on the local computer.

This series aims to provide, for each mechanism, information about:

  • On which platforms is it available?
  • Can the site detect that the app/mechanism is available?
  • Can the site send more than one message to the application without invoking the mechanism again, or is it fire-and-forget?
  • Can the application bidirectionally communicate back to the web content via the same mechanism?
  • What are the security implications?
  • What is the UX?

Application Protocols

Blog Post

tl;dr: Apps can register protocol schemes. Browsers will spawn the apps when navigating to the scheme.

Characteristics: Fire-and-Forget. Non-detectable. Supported across all browsers for decades. Prompts by default, but can be disabled.

Native Messaging via Extensions

Blog Post – Coming someday. For now, see nativeMessaging API.

tl;dr: Browser extensions can communicate with a local native app using stdin/stdout passing JSON between the app and the extension. The extension may pass information to/from web content if desired.

Characteristics: Bi-directional communications. Detectable. Supported across most modern browsers; not legacy IE. Dunno about Safari. Prompts on install, but not required to use.

File downloads (Traditional)

Blog Post – Coming someday.

tl;dr: Apps can register to handle certain file types. User may spawn the app to open the file.

Characteristics: Fire-and-Forget. Non-detectable. Supported across all browsers. Prompts for most file types, but some browsers allow disabling.

File downloads (DirectInvoke)

Blog Post

Internet Explorer/Edge support DirectInvoke, a scheme whereby a file handler application is launched with a URL instead of a local file.

Characteristics: Fire-and-Forget. Non-detectable. Windows only. Supported in Internet Explorer, Edge 18 and below, and Edge 78 and above. Degrades gracefully into a traditional file download.

Local Web Server

Blog Post – Coming someday.

tl;dr: Apps can run a HTTP(S) server on localhost and webpages can communicate with that server using fetch/XHR/WebSocket/WebRTC etc.

Characteristics: Bi-directional communications. Detectable. Supported across all browsers. Not available on mobile. Complexities around Secure Contexts / HTTPS, and loopback network protections in Edge18/IE AppContainers.

HTTPS

In many cases, HTTPS pages may not send requests to HTTP URLs (depending on whether the browser supports the new “SecureContexts” feature that allows HTTP://LOCALHOST), in some cases applications wish to get a HTTPS certificate for their local servers. This is complex and error-prone. There’s a writeup of how Plex got HTTPS certificates for their local servers.

Notes: https://wicg.github.io/cors-rfc1918/#mixed-content

A nice writeup of Amazon Music’s web exposure can be found here: https://medium.com/0xcc/what-the-heck-is-tcp-port-18800-a16899f0f48f

Andrew (@drewml) tweeted at 4:23 PM on Tue, Jul 09, 2019:
The @zoom_us vuln sucks, but it’s definitely not new. This was/is a common approach used to sidestep the NPAPI deprecation in Chrome. Seems like a @taviso favorite:
Anti virus, logitech, utorrent. (https://twitter.com/drewml/status/1148704362811801602?s=03)

Bypass of localhost CORS protections by utilizing GET request for an Image
https://bugs.chromium.org/p/chromium/issues/detail?id=951540#c28

View at Medium.com

WebRTC tricks to bypass HTTPS requirements https://twitter.com/sleevi_/status/1177248901990105090?s=20

Variant: Common Remote Server as a Broker

An alternative approach would be to communicate indirectly. For instance, a web application and a client application using HTTPS/WebSockets could each individually communicate to a common server which brokers messages between them.

 

AppLinks in Edge/Windows

Allow navigation to certain namespaces (domains) to be handed off to a native application on the local device.

https://blogs.windows.com/windowsdeveloper/2016/10/14/web-to-app-linking-with-appurihandlers/

https://docs.microsoft.com/en-us/windows/uwp/launch-resume/web-to-app-linking

Legacy Plugins/ActiveX architecture

Please no!

Characteristics: Bi-directional communications. Detectable. Support has been mostly removed from most browsers. Generally not available on mobile. One of the biggest sources of security risk in web platform history.

Android Intents

Dunno much about these.

Android Instant Apps

Dunno much about these. Basically, the idea is that navigating to a website can install/run an Android Application.

 

In my last post, I showed you how to use OmahaProxy’s Find Releases tool to discover which versions of Chrome contain a given bugfix.

I noted that if you’re using Microsoft’s new Chromium-based Edge, you can look at the edge://version page or this extension to see the upstream Chrome version upon which Edge is based:

ShowChromeVersion

Until today, I never realized that this version number can be off-by-one. I noticed this discrepancy after complaints about a nasty bug in Edge that caused a bunch of webpage renderers to crash. The fix, we learned from OmahaProxy, landed in version Chrome Canary 78.0.3872.0.

When Edge next updated, I was alarmed to see that the crash was still present, but the user-agent header contains Chrome/78.0.3872.0, the version that’s supposed to have the fix.

StillCrashing.png

What’s going on?

Chromium’s src/chrome/VERSION file gets updated shortly after the prior Chrome Canary ships, as you can see in this timeline:

  1. At 03:13 on Aug 1, /src/chrome/VERSION was updated to 3872.
  2. Around 04:00 on Aug 1, Edge’s code pump ingested all of the commits from upstream Chromium into our internal integration branch, preparing for our 78.0.242 Canary build.
  3. At 11:20 on Aug 1, the fix for the crash landed.
  4. At 23:16 on Aug 1, the last commit from Chromium Master that went into Chrome Canary 78.0.3872 landed.
    Within a few hours, the Canary branch was declared an Official release.
  5. At 03:13 on Aug 2, /src/chrome/VERSION was updated to 3873.

Because Edge’s code pump pulled between when the version number was bumped to 3872 [1] and when the crasher’s fix landed [3], Edge ended up with a build that has the 3872 version number, but without the fix that went into Chrome Canary 78.0.3872.

So, for the moment at least, the safe way to be sure that a given Edge build has a given Chromium fix is to ensure that the Chrome token in the user-agent string is later than the Chrome Canary build number in which the patch first shipped.

Chrome 3889 includes some fixes and bugs not present in Edge 3889

-Eric

Yesterday, we covered the mechanisms that modern browsers can use to rapidly update their release channels. Today, let’s look at how to figure out when an eagerly awaited fix will become available in the Canary channels.

By way of example, consider crbug.com/977805, a nasty beast that caused some extensions to randomly be disabled and marked corrupt:

corruption

By bisecting the builds (topic of a future post) to find where the regression was introduced, we discovered that the problem was the result of a commit with hash fa8cdc81f5 that landed back on May 20th. This (probably security) change exposed an earlier bug in Chromium’s extension verification system such that an aborted request for a resource in an extension (say, because a page getting torn down just as a content script was getting injected) resulted in the verification logic thinking that the extension’s resource file was corrupted on disk.

On July 12th, the area owner landed a fix with the commit hash of cad2f6468. But how do I know whether my browser has this fix already? In what version(s) did the fix get released?

To answer these questions, we turn back to our trusted OmahaProxy. In the Find Releases box at the bottom, paste the full or partial hash value into the box and hit the Find Releases button:

CommitHashFix

The system will churn for a bit and then return the following page:

CommitHashLanded

So, now we know two things: 1) The fix will be in Chromium-based browsers with version numbers later than 77.0.3852.0, and 2) So far, the fix only landed there and hasn’t been merged elsewhere.

Does it need to be merged? Let’s figure out where the original regression was landed using the same tool with the regressing change list’s hash:

regressregress

We see that the regression originally landed in Master before the Chrome 76 branch point, so the bug is in Chrome 76.0.3801 and later. That means that after the fix is verified, we’ll need to request that it be merged from Master where it landed, over to the 76 branch where it’s also needed.

We can see what that’ll look like by looking at the fix for crbug.com/980803. This regression in the layout engine was fixed by a1dd95e43b5 in 77, but needed to be put into Chromium 76 as well. So, it was, and the result is shown as:Merged

Note: It’s possible for a merge to be performed but not show up here. The tool looks for a particular string in the merge’s commit message, and some developers accidentally remove or alter it.

Finally, if you’re really champing at the bit for a fix, you might run Find Releases on a commit hash and see

notyetin

Assuming you didn’t mistype the hash, what this means is that the fix isn’t yet in the Canary channel. If you were to clone the Chromium master @HEAD and build it yourself, you’d see the fix, but it’s not yet in a public Canary. In almost all cases, you’ll need to wait until the next morning (Pacific time) to get an official channel build with the fix.

Now, so far we’ve mostly focused on Chrome, but what about other Chromium-based browsers?

Things are mostly the same, with the caveat that most other Chromium-based browsers are usually days to weeks to (gulp) months behind Chrome Canary. Is the extensions bug yet fixed in my Edge Canary?

The simplest (and generally reliable) way to check is to just look at the Chrome token in the browser’s user agent string by visiting edge://version or using my handy Show Chrome Version browser extension. As you can see in both places, Edge 77.0.220.0 Canary is based on Chromium 77.0.3843, a bit behind the 77.0.3852 version containing the extensions verification fix:

ShowChromeVersion

So, I’ll probably have to wait a few days to get this fix into my browser.

Warning: The “Chrome” token shown in Edge might be off-by-one. See my followup post for details.

Also, note that it’s possible for Microsoft and other Chromium embedders to “cherry-pick” critical fixes into our builds before our merge pump naturally pulls them down from upstream, but this is a relatively rare occurrence for Edge Canary. 

 

tl;dr: OmahaProxy is awesome!

-Eric