security

By this point, most browser enthusiasts know that Chrome has a rapid release cycle, releasing a new stable version of the browser approximately every six weeks. The Edge team intends to adopt that rapid release cadence for our new browser, and we’re already releasing new Edge Dev Channel builds every week.

What might be less obvious is that this six week cadence represents an upper-bound for how long it might take for an important change to make its way to the user.

Background: Staged Rollouts

Chrome uses a staged rollout plan, which means only a small percentage (1%-5%) of users get the new version immediately. If any high-priority problems are flagged by those initial users, the rollout can be paused while the team considers how to best fix the problem. That fix might involve shipping a new build, turning off a feature using the experimentation server, or dynamically updating a component.

Let’s look at each.

Respins

If a serious security or functionality problem is found in the Stable Channel, the development team generates a respin of the release, which is a new build of the browser with the specific issue patched. The major and minor version numbers of the browser stay the same. For instance, on July 15th, Chrome Stable version 75.0.3770.100 was updated to 75.0.3770.142. Users who had already installed the buggy version in the channel are updated automatically, and users who haven’t yet updated to the buggy version will just get the fixed version when the rollout reaches them.

If you’re curious, you can see exactly which versions of Chrome are being delivered from Google’s update servers for each Channel using OmahaProxy.

Field Trial Flags

In some cases, a problem is discovered in a new feature that the team is experimenting with. In these cases, it’s usually easy for the team to simply remotely disable or reconfigure the experiment as needed using the experimental flags. The browser client periodically polls the development team’s servers to get the latest experimental configuration settings. Chrome codenames their experimental system “Finch,” while Microsoft calls ours “CFR” (Controlled Feature Rollout).

You can see your browser’s current field trial configuration by navigating to

chrome://version/?show-variations-cmd

The hexadecimal Variations list is generally inscrutable, but the Command-line variations section later in the page is often more useful and allows you to better understand what trials are underway. You can even use this list to identify the exact trial causing a particular problem.

Regular readers might remember that I’ve previously written about Chrome’s Field Trials system.

Components

In other cases, a problem is found in a part of the browser implemented as a “Component.” Components are much like hidden, built-in extensions that can be silently and automatically updated by the Component Updater.

The primary benefit of components is that they can be updated without an update to Chrome itself, which allows them to have faster (or desynchronized) release cadences, lower bandwidth consumption, and avoids bloat in the (already sizable) Chrome installer. The primary drawback is that they require Chrome to tolerate their absence in a sane way.

To me, the coolest part of components is that not only can they update without downloading a new version of the browser, in some cases users don’t even need to restart their browser to begin using the updated version of a component. As soon as a new version is downloaded, it can “take over” from the prior version.

To see the list of components in the browser, visit

chrome://components

In my Chrome Canary instance, I see the following components:

Components

As you can see, many of these have rather obtuse names, but here’s a quick explanation where I know offhand:

  • MEI Preload – Policies for autoplay (see chrome://media-engagement/ )
  • Intervention Policy – Controls interventions used on misbehaving web pages
  • Third Party Module – Used to exempt accessibility and other components from the Code Integrity protections on the browser’s process that otherwise forbid injection of DLLs.
  • Subresource Filter Rules – The EasyList adblock database used by Chrome’s built-in adblocker to remove ads from a webpage when the Safe Browsing service indicates that a site violates the guidelines in the Better Ads Standard.
  • Certificate Error Assistant – Helps users understand and recover from certificate errors (e.g. when behind a known WiFi captive portal).
  • Software Reporter Tool – Collects data about system configuration / malware.
  • CRLSet – List of known-bad certificates (used to replace OCSP/CRL).
  • pnacl – Portable Native Client (overdue for removal)
  • Chrome Improved Recovery Unsure, but comments suggest this is related to helping fix broken Google Updater services, etc.
  • File Type Policies – Maps a list of file types to a set of policies concerning how they should be downloaded, what warnings should be presented, etc. See below.
  • Origin Trials – Used to allow websites to opt-in to experimenting with future web features on their sites. Explainer.
  • Adobe Flash Player – The world’s most popular plugin, gradually being phased out; slated for complete removal in late 2020.
  • Widevine Content DecryptionA DRM system that permits playback of protected video content.

If you’re using an older Chrome build, you might see:

If you’re using Edge, you might see:

If you’re using the Chromium-derived Brave browser, you’ll see that brave://components includes a bunch of extra components, including “Ad Blocker”, “Tor Client”, “PDF Viewer”, “HTTPS Everywhere”, and “Local Data Updater.”

If you’re using Chrome on Android, you might notice that it’s only using three components instead of thirteen; the missing components simply aren’t used (for various reasons) on the mobile platform. As noted in the developer documentation, “The primary drawback [to building a feature using a Component] is that [Components] require Chrome to tolerate their absence in a sane way.

Case Study: Fast Protection via Component Update

Let’s take a closer look at my favorite component, the File Type Policies component.

When the browser downloads a file, it must make a number of decisions for security reasons. In particular, it needs to know whether the file type is potentially harmful to the user’s device. If the filetype is innocuous (e.g. plaintext), then the file may be downloaded without any prompts. If the type is potentially dangerous, the user should be warned before the download completes, and security features like SafeBrowsing/SmartScreen should scan the completed download for malicious content.

In the past, this sort of “What File Types are Dangerous?” list was hardcoded into various products. If a file type were later found to be dangerous, patching these products with updated threat information required weeks to months.

In contrast, Chrome delivers this file type policy information using the File Type Policies component. The component lets Chrome engineers specify which types are dangerous, which types may be configured to automatically open, which types are archives that contain other files that may require scanning, and so on.

How does this work in the real world? Here’s an example.

Around a year ago, it was discovered that files with the .SettingsContent-ms file extension could be used to compromise the security of the user’s device. Browsers previously didn’t take any special care when such files were downloaded, and most users had no idea what the files were or what would happen if they were opened. Everyone was caught flat-footed.

In less than a day after this threat came to light, a Chrome engineer simply updated a single file to mark the settings-content.ms file type as potentially dangerous. The change was picked up by the component builder, and Chrome users across all versions and channels were protected as their browser automatically pulled down the updated component in the background.

 

Ever faster!

-Eric

You should enable “2-Step Verification” for logins to your Google account.

Google Authenticator is an app that runs on your iOS or Android phone and gives out 6 digit codes that must be entered when you log in on a device. This can’t really prevent phishing (because a phishing page will just ask you for a code from it and if you’re fooled, you’ll give it up) but it does prevent attacks if a bad guy has only your password. Authenticator is free and simple to use, and is supported by many sites, including GitHub. Microsoft offers a nearly identical Authenticator app too. How ToTP works.

YubiKeys (and similar) are small USB keys that you can configure your accounts to require. They are cheapish (~$18) and cannot be phished (even if you tap your key while on a phishing site, the attacker cannot use it due to how the crypto works). These are the best protection for your accounts (Googlers all use them) and are highly recommended for Chrome extension developers, journalists, activists, etc, etc.

One of my final projects on the Chrome team was writing an internal document outlining Best Practices for Secure URL Display. Yesterday, it got checked into the public Chromium repro, so if this is a topic that interests you, please have a look!

Additionally, at Enigma 2019, the Chrome team released Trickuri (pronounced “trickery”) a tool for manual testing of URL displays against tricky attacks.

In Windows 10 RS5 (aka the “October 2018 Update”), the venerable XSS Filter first introduced in 2008 with IE8 was removed from Microsoft Edge. The XSS Filter debuted in a time before Content Security Policy as a part of a basket of new mitigations designed to mitigate the growing exploitation of cross-site scripting attacks, joining older features like HTTPOnly cookies and the sandbox attribute for IFRAMEs.

The XSS Filter feature was a difficult one to land– only through the sheer brilliance and dogged persistence of its creator (David Ross) did the IE team accept the proposal that a client-side filtering approach could be effective with a reasonable false positive rate and good-enough performance to ship on-by-default. The filter was carefully tuned, firing only on cross-site navigation, and in need of frequent updates as security researchers inside and outside the company found tricks to bypass it. One of the most significant technical challenges for the filter concerned how it was layered into the page download pipeline, intercepting documents as they were received as raw text from the network. The filter relied evaluating dynamically-generated regular expressions to look for potentially executable markup in the response body that could have been reflected from the request URL or POST body. Evaluating the regular expressions could prove to be extremely expensive in degenerate cases (multiple seconds of CPU time in the worst cases) and required ongoing tweaks to keep the performance costs in check.

In 2010, the Chrome team shipped their similar XSS Auditor feature, which had the luxury of injecting its detection logic after the HTML parser runsdetecting and blocking reflections as they entered the script engine. By throttling closer to the point of vulnerability, its performance and accuracy is significantly improved over the XSS Filter.

Unfortunately, no matter how you implement it, clientside XSS filtration is inherently limited– of the four classes of XSS Attack, only one is potentially mitigated by clientside XSS filtration. Attackers have the luxury of tuning their attacks to bypass filters before they deploy them to the world, and the relatively slow ship cycles of browsers (6 weeks for Chrome, and at least a few months for IE of the era) meant that bypasses remained exploitable for a long time.

False positives are an ever-present concern– this meant that the filters have to be somewhat conservative, leading to false-negative bypasses (e.g. multi-stage exploits that performed a same-site navigation) and pronouncements that certain attack patterns were simply out-of-scope (e.g. attacks encoded in anything but the most popular encoding formats).

Early attempts to mitigate the impact of false positives (by default, neutering exploits rather than blocking navigation entirely) proved bypassable and later were abused to introduce XSS exploits in sites that would otherwise be free of exploit (!!!). As a consequence, browsers were forced to offer options that would allow a site to block navigation upon detection of a reflection, or disable the XSS filter entirely.

Surprisingly, even in the ideal case, best-of-class XSS filters can introduce information disclosure exploits into sites that are free of XSS vulnerabilities. XSS filters work by matching attacker-controlled request data to text in a victim response page, which may be cross-origin. Clientside filters cannot really determine whether a given string from the request was truly reflected into the response, or whether the string is naturally present in the response. This shortcoming creates the possibility that such a filter may be abused by an attacker to determine the content of a cross-origin page, a violation of Same Origin Policy. In a canonical attack, the attacker frames a victim page with a string of interest in it, then attempts to determine that string by making a series of successive guesses until it detects blocking by the XSS filter. For instance, xoSubframe.contentWindow.length exposes the count of subframes of a frame, even cross-origin. If the XSS filter blocks the loading of a frame, its subframe count is zero and the attacker can conclude that their guess was correct.

In Windows 10 RS4 (April 2018 update), Edge shipped its implementation of the Fetch standard, which redefines how the browser downloads content for page loads. As a part of this massive architectural shift, a regression was introduced in Edge’s XSS Filter that caused it to incorrectly determine whether a navigation was cross-origin. As a result, the XSS Filter began running its logic on same-origin navigations and skipping processing of cross-origin navigations, leading to a predictable flood of bug reports.

In the process of triaging these reports and working to address the regression, we concluded that the XSS Filter had long been on the wrong side of the cost/benefit equation and we elected to remove the XSS Filter from Edge entirely, matching Firefox (which never shipped a filter to begin with).

We encourage sites that are concerned about XSS attacks to use the client-side platform features available to them (Content-Security-Policy, HTTPOnly cookies, sandboxing) and the server-side patterns and frameworks that are designed to mitigate script injection attacks.

-Eric Lawrence

My oldest supported Windows application is a launcher app named SlickRun, and it’s ~24 years old this year. I haven’t done much to maintain it over the last few years, although it’s now available in 64-bit and runs great on Windows 10. (Thanks go to Embarcadero, who now offer a free “Community” edition of Delphi, the language/platform I ported SlickRun to circa 1994).

I still fix bugs in SlickRun from time to time, and as I was playing with Rust a few days ago I was reminded of one of the oldest limitations in my code– if you update your system’s %PATH% variable, those changes aren’t seen by applications/consoles spawned by SlickRun (even after the change) until you restart SlickRun. This is particularly annoying because it’s so unexpected– users expect that command consoles launched by Win+R,cmd.exe,Enter will behave the same way as Win+Q,cmd,Enter, but the former consoles have the updated %PATH% while the latter do not.

While ShellExecute() sounds like it’s an API that causes the shell (aka Explorer) to execute something, in fact it does nothing of the sort.

Updating the Environment Block

The root cause of the “outdated path” problem is that processes launched via ShellExecute inherit the environment variables of their spawning process, and those environment variables (typically) are assigned as the process launches and never touched again. Because SlickRun starts with Windows, the %PATH% when it starts is the %PATH% that every process it launches inherits. (You can easily view a process’ environment block using the Properties > Environment tab in Process Explorer).

So, how does Explorer detect the change? That part I figured out ages ago– after updating an environment variable, the System Properties > Environment Variables Control Panel UI (or the SetX.exe console tool) broadcasts a WM_SETTINGCHANGE message to all top-level windows with an lparam containing the string “Environment”. I could easily add code to SlickRun to detect that the variables had changed, but for decades I didn’t really know what to do next… I didn’t know how to read the updated variables (without doing something hacky like restarting the process) nor ensure that they were passed to the applications spawned by ShellExecute.

Yesterday, I got fed up and started Googling. A few posts on StackOverflow mentioned a promising-sounding function, RegenerateUserEnvironment. And while that function appears to be undocumented, there’s an amazing issue filed in an open-source tracker that explains exactly how Windows Explorer uses this function– basically, just wait for the WM_SETTINGCHANGE event, then call the API. The RegenerateUserEnvironment will replace the calling process’ current environment block with the latest values.

Launching at Medium Integrity

While we’re on the topic of executing applications “like the shell”, another scenario came up twelve years ago when Windows Vista was first introduced. The SlickRun installer, written in NSIS, launches SlickRun when installation completes. Unfortunately, the installer runs with Admin rights (High integrity), which means that, by default, all of the programs it launches inherit that integrity. For SlickRun, this is especially bad because it means that any programs that it, in turn, launches during that first session (e.g. your browser!) will run at High integrity too. Not good.

While you can easily use the “Runas” verb to ShellExecute to launch a High integrity application from a Medium integrity application, there (depressingly) isn’t a way to do the opposite. For years, the official recommendation was to do some fancy coding to clone Explorer’s tokens and use those. Unfortunately, this is quite complicated to implement, especially within a NSIS script.

As it turns out, however, there’s a trivial workaround which works quite well– while ShellExecute doesn’t run things as the shell, applications can easily get Explorer to launch anything they like at Explorer’s integrity. The trick is to simply invoke explorer.exe and pass the filename to be executed as the first command line argument:

While this approach isn’t technically supported, I expect it is likely to continue to work for the foreseeable future.

 

It’s depressing that together these tricks have taken me almost twenty years to discover, but I’m happy that I have. I hope they help you out.

-Eric

In order to be eligible for the HSTS Preload list, your site must usually serve a Strict-Transport-Security header with a includeSubdomains directive.

Unfortunately, some sites do not follow the best practices recommended and instead just set a one-year preload header with includeSubdomains and then immediately request addition to the HSTS Preload list. The result is that any problems will likely be discovered too late to be rapidly fixed– removals from the preload list may take months.

The Mistakes

In running the HSTS preload list, we’ve seen two common mistakes for sites using includeSubdomains:

Mistake: Forgetting Intranet Hosts

Some sites are set up with a public site (example.com) and an internal site only accessible inside the firewall (app.corp.example.com). When includeSubdomains is set, all sites underneath the specified domain must be accessible over HTTPS, including in this case app.corp.example.com. Some corporations have different teams building internal and external applications, and must take care that the security directives applied to the registrable domain are compatible with all of the internal sites running beneath it in the DNS hierarchy. Following the best practices of staged rollout (with gradually escalating max-age directives) will help uncover problems before you brick your internal sites and applications.

Of course, you absolutely should be using HTTPS on all of your internal sites as well, but HTTPS deployments are typically smoother when they haven’t been forced by Priority-Zero downtime.

Mistake: Forgetting Delegated Hosts

Some popular sites and services use third party companies for advertising, analytics, and marketing purposes. To simplify deployments, they’ll delegate handling of a subdomain under their main domain to the vendor providing the service. For instance, http://www.example.com will point to the company’s own servers, while the hostname mail-tracking.example.com will point to Experian or Marketo servers. A HSTS rule with includeSubdomains applied to example.com will also apply to those delegated domains. If your service provider has not enabled HTTPS support on their servers, all requests to those domains will fail when upgraded to HTTPS. You may need to change service providers entirely in order to unbrick your marketing emails!

Of course, you absolutely should be using HTTPS on all of your third-party apps as well, but HTTPS deployments are typically smoother when they haven’t been forced by Priority-Zero downtime.

Recovery

If you do find yourself in the unfortunate situation of having preloaded a TLD whose subdomains were not quite ready, you can apply for removal from the preload list, but, as noted previously, the removal can be expected to take a very long time to propagate. For cases where you only have a few domains out of compliance, you should be able to quickly move them to HTTPS. You might also consider putting a HTTPS front-end out in front of your server (e.g. Cloudflare’s free Flexible SSL option) to allow it to be accessed over HTTPS before the backend server is secured.

Deploy safely out there!

-Eric