Edge/Chrome Policy Registry Entries

One of the more common problems reported by Enterprises is that certain Edge/Chrome policies do not seem to work properly when the values are written to the registry.

For instance, when using the about:policy page to examine the browser’s view of the applied policy, the customer might complain that a policy value they’ve entered in the registry isn’t being picked up:

In a quick look at the Microsoft documentation for the policy: ExemptDomainFileTypePairsFromFileTypeDownloadWarnings, the JSON syntax looks almost right, but in one example it’s wrapped in square brackets. But in another example, the value is not. What’s going on here?

A curious and determined administrator might notice that by either adding the square brackets:

…or by changing the Exempt…Warnings registry entry from a REG_SZ into a key containing values:

…the policy works as expected:

What’s going on?

As the Chromium policy_templates.json file explains, each browser policy is implemented as a particular type, depending on what sort of data it needs to hold. For the purposes of our discussion, the two relevant types are list and dict. Either of these types can be used to hold a set of per-site rules:

* 'list' - a list of string values. Using this for a list of JSON strings is now discouraged, because the 'dict' is better for JSON.
* 'dict' - perhaps should be named JSON. An arbitrarily complex object or array, nested objects/arrays, etc. The user defines the value with JSON.

When serializing these policies to the registry: dict policies use a single REG_SZ registry string, while the intention is that list policies are instead stored in values of a subkey. However, that is not technically enforced, and you may specify the entire list using a single string. However, if you do represent the entire JSON list as a single string value, you must wrap the value in [] (square brackets) to represent that you’re including a whole array of values.

In contrast, if you encode the individual rules as numbered string values within a key (this is what we recommend), then you must omit the square brackets because each string value represents a single rule (not an array of rules).

Group Policy Editor

If you use the Group Policy Editor rather than editing the registry directly, each list-based policy has a Show... button that spawns a standalone list editor:

In contrast, when editing a dict, there’s only a small text field into which the entire JSON string should be pasted:

To ensure that a JSON policy string is formatted correctly, consider using a JSON validator tool.

Bonus Policy Trivia

Encoding

While JavaScript allows wrapping string values in ‘single’ quotes, JSON and thus the policy code requires that you use “double” quotes.

Non-Enterprise Use

The vast majority of policies will work on any computer, even if it’s just your home PC and you’re poking the policy into the registry directly. However, to limit abuse by other software, there are a small set of “protected” policies whose values are only respected if Chromium detects that a machine is “managed” (via Domain membership or Intune, for example).

The kSensitivePolicies list can be found in the Chromium source and encompasses most, but not all (e.g. putting a Application Protocol on the URLAllowlist only works for managed machines) restrictions.

You can visit about:management on a device to see whether Chromium considers it managed.

Case-Sensitivity

Chromium treats policy names in a case-sensitive fashion. If you try to use a lowercase character where an uppercase character is required (or vice-versa), your policy will be ignored. Double-check the case of all of your policy names if the about:policy page complains about an Unknown Policy.

Refresh

You might wonder when Edge reads the policy entries from the registry. Chromium’s policy code does not subscribe to registry change event notifications (Update: See below). That means that it will not notice that a given policy key in the registry has changed until:

  1. The browser restarts, or
  2. Fifteen minutes pass, or
  3. You push the Reload Policies button on the about://policy page, or
  4. A Group Policy update notice is sent by Windows, which happens when the policy was applied via the normal Group Policy deployment mechanism.

    Chromium and Edge rely upon an event from the RegisterGPNotification function to determine when to re-read the registry.

Update: Edge/Chrome v103+ now watch the Windows Registry for change notifications under the HKLM/HKCU Policy keys and will reload policy if a change is observed. Note that this observation only works if the base Policies\vendor\BrowserName registry key already existed; if it did not, there’s nothing for the observer to watch. For Dev/Canary channels, a registry key can be set to disable the observer. Update-to-the-Update: The watcher was backed out shortly after I wrote this; it turns out that it caused bugs because the way that Group Policy updates work is the old registry keys are deleted, some non-zero time passes, and then the new keys are written. With the Watcher in place, this was causing the policies to be reapplied in the middle, turning off the policies for some time. This caused side-effects like the removal and reinstallation of browser extensions.

Note that not all policies support being updated at runtime; the Edge Policy documentation notes whether each policy supports updates with the Dynamic Policy Refresh value (visualizing the dynamic_refresh flag in the underlying source code).

-Eric

Smarter Defaults by Paying Attention

As a part of every page load, browsers have to make dozens, hundreds, or even thousands of decisions of varying levels of importance: should a particular API be available? Should a resource load be permitted? Should script be allowed to run? Should video be allowed to start playing automatically? Should cookies or credentials be sent on network requests? The list is long.

In Chromium, most of these decisions are controlled by per-site settings that must be manually1 configured by either the user, or administrative policy.

However, manual configuration of settings is tedious, and, for some low-impact decisions, probably not worth the effort.

Wouldn’t it be cool if each user’s individual browser were smart enough to consider clues about what behavior that specific user is likely to want, and then use those clues in picking a default behavior?

User Activation / Gestures

The first, and simplest, mechanism used to make smarter decision is called user-gestures. Certain Web APIs and browser features (e.g. the popup blocker, file download experience, full-screen API, etc) require that the user has interacted with the page before the feature can be used.

This unblocking signal is called a User Gesture or (formally) User Activation.

Requiring a User Gesture can help prevent (or throttle) simple and unwanted “drive by” behaviors, where a website uses (abuses?) a powerful API without any indication that a user wants to allow a site to use it.

Unfortunately, User Gestures are a pretty low hurdle against abuse– sites can perform a variety of trickery to induce the user to click and unlock the protected feature.

Enter Site Engagement

Chromium supports a feature called Site Engagement, which works similarly to User Activation, but stretched over time. Instead of allowing a single gesture to unblock a single API call that occurs within the subsequent 5 seconds, Site Engagement calculates a score that grows with user interactions and decays over inactive time. In this way, sites that you visit often and engage with heavily offer a streamlined experience vs. a site you’ve only visited once (e.g. while you’re clicking around in search results). If you stop engaging with a site for a while, its engagement score decays and it loses any “credit” it had accrued.

You can easily see your browser’s unique Engagement scores by visiting the url: about://site-engagement/. Here’s what mine looks like:

If you like, you can use the textboxes next to each site to manually adjust its score (e.g. for debugging).

A separate page, about://media-engagement/ tracks your engagement with Media content (e.g. video players).

The Site Engagement primitive can be used in many different features; for instance, it can weigh into things like:

  1. May Audio/video automatically start playback without a user-gesture?
  2. Should Tracking Prevention’s “Balanced mode” block a potential tracker?
  3. Should a permission prompt be presented as a balloon, or a more subtle icon in the toolbar?

Site Engagement is a more robust mechanism than User Activation, but it’s still just a heuristic that can suffer from both false negatives (e.g. I’m using InPrivate or have just cleared my history, and am now visiting a trusted site) and false positives (a site has tricked me into engaging over time and now starts abusive behavior). As such, each feature’s respect for Site Engagement must be carefully considered, and recovery from false-negatives and false-positives must be simple.

Bespoke Mechanisms

Beyond User Activation and User Gesture requirements, various other signals have been proposed or used as clues into what behavior the user wants.

In determining whether a given file download is likely to be desired, for instance, the Downloads code uses a Familiar Initiator heuristic, which treats downloads as less suspicious if the originator of the Download request is a site that the user has visited before the current date.

Other features have considered such signals as:

  1. Is the site one that the user visited by navigating via the address bar (as opposed to navigations triggered by script)
  2. Is the site’s origin amongst the user’s Bookmarks/Favorites?
  3. Is the site an installed PWA?
  4. Do other users of a given site often respond to a particular permission decision in a particular way (aka “Cloud Consent”)? This approach is used in Adaptive Notifications.

Impact on Debugging

One downside of all of these mechanisms is that they can make debugging harder for folks like me– what you saw on your browser might not be what I see on mine, and what you experienced yesterday might not be what you experience tomorrow.

Tools like the about:site-engagement page can allow me to mimic some of your configuration, but some settings (e.g. the Familiar Initiator heuristic, or the timing of your User Gestures) are harder to account for.

That said, while smarter browsers are somewhat harder to debug, they can be much more friendly for end-users.

-Eric

1 A few settings inherit from Windows Security Zones.

Mid-February Checkin

tl;dr: On track.

Back in January, I wrote about my New Years’ Resolutions. I’m now 45 days in, and things are going pretty well.

  • Health and Finance: A dry January. Dry January has turned into dry February. Beyond idle thoughts “What should I do right now? In the old days, I’d pour myself a drink” and ten tough seconds (someone opened a delicious-smelling bottle of wine), I haven’t missed alcohol at all.
  • Health: Track my weight and other metrics. Done. I’m weighing in on my smart scale twice a week. I have a blood pressure cuff now; I haven’t been using it very regularly though. All metrics are headed in the right direction.
  • Health: Find sustainable fitness habits. Going great. I’ve bought a fancy treadmill and started using it, along with the exercise bike I bought in August 2020 and haven’t used until now. I’m working out at least 5 days a week for an hour or more. I also signed up for the Austin Capitol 10K in April, although I expect to walk a lot of it.
  • Travel: Haven’t yet booked an Alaska cruise, but it’s still on the back burner. I did book a near-repeat of the Christmas cruise for me and the kids over Spring break, and I’ve made some progress on bigger travel plans.
  • Finance: Spend more intentionally. Between the treadmill and the cruise, I’ve spent a lot of money so far in 2022, but for the most part it doesn’t feel wasted. The stock market hasn’t been doing too well, so I’m not feeling super-duper secure here.
  • Life: Produce more. I haven’t done a ton of this, beyond blogging and working on tools. I’m trying Hello Fresh, which has been educational and interesting– I’ve not really cooked anything remotely fancy before. I’ve also made some progress on delayed and on-going house projects though.

Sadly, work has not been going great, but everything else in life seems to be considerably better than a few months ago.

MHTML in Chromium

The MHTML file format (aka “Webpage, single file”) allows a single file to contain the multiple resources that are used to load a webpage (script, css, images, etc).

Edge (Chromium) has code to provide limited support for MHTML, though it retains an option to use the format when saving the current page via Ctrl+S or the Save page as... menu command:

Saving MHTML from Save Page As…

Restriction: No Script

Reloading a saved MHTML file in Edge/Chrome/Chromium/etc will disable script.

Interestingly, when Chromium saves an MHTML file, it omits the <script> and <noscript> blocks entirely. If you saved the MHTML file from another tool that included script, when reloaded in Chromium, its script is not executed and a notice is shown in the Developer Tools Console:

Restriction: Disabled Forms

When loading a MHTML file, form controls like text fields and buttons are disabled, preventing the user from filling or submitting a form:

Restriction: Resources May not load

Chromium uses very restrictive rules for Same-Origin-Policy evaluation that can often prevent embedded resources (including images and stylesheets from loading) properly, leading to missing content and console warnings:

Limitation: Encodings

Internet Explorer’s MHTML component supported a variety of content-encodings that are not supported in Chromium. I fixed one bug but there are numerous other limitations in MHTML support.

Workaround: IEMode

If you need to load legacy MHTML content to load in Edge, your best bet is to configure the file to load in IEMode.

Edge includes some code which attempts to automatically detect whether a given MHTML file is compatible with Edge mode, e.g. checking for a Saved by Blink marker:

-Eric

Adding Protocol Schemes to Chromium

Previously, I’ve written a lot about Application Protocols, which are a simple and popular common mechanism for browsers to send a short string of data out to an external application for handling. For instance, mailto is a common example of a scheme treated as an Application Protocol; if you invoke mailto:someone@somewhere.com, the browser will convert this to an OS execution of e.g.:

outlook.exe mailto:someone@somewhere.com

Application Protocols are popular because they are simple, work across most browsers on most operating systems, and because they can be added by 3rd parties without changes to the browser.

However, Application Protocols have one crucial shortcoming– they cannot directly return any data to the browser itself. If you wanted to do something like:

<img src='myScheme://photos/mypic.png' />

… there’s no straightforward way for your application protocol to send data back into the browser to render in that image tag.

You might be thinking: “Why can’t I, a third-party, simply provide a full implementation of a protocol scheme, such that my object gets a URL, and it returns a stream of bytes representing the data from that URL, just like HTTP and HTTPS do?

Asynchronous Pluggable Protocols

Back in the early days of Internet Explorer (1990s), the team didn’t know what protocols would turn out to be important. So, they built a richly extensible system of “Asynchronous Pluggable Protocols” (APP) which allowed COM objects to supply a full implementation of a protocol. The browser, upon seeing a URL (“moniker”) would parse the URL Scheme out, then “bind” to the APP object and send/receive data from that object. This allowed Internet Explorer to handle URLs in an abstract way and support a broad range of protocols (e.g. ftp, file, gopher, http, https, about, mailto, etc).

In many cases, we think only about receiving data from a protocol, but it’s important to remember that you can also send data (beyond the url) to a protocol; consider a file upload form that uses the POST method to send a form over HTTPS, for example.

Writing an APP was extremely challenging, and very risky– because APPs are exposed to the web, a buggy APP could be exploited by any webpage, and thanks to the lack of sandboxing in early IE, would usually result in full Remote Code Execution and compromise of the system. Beyond the security concerns, there were reliability challenges as well– writing code that would properly handle the complex threading model of a browser downloading content for a web page was very difficult, and many APP implementations would crash or hang the browser when conditions weren’t as the developer expected.

Despite the complexity and risk, writing APPs provided Internet Explorer with unprecedented extensibility power. “Back in the day” I was able to do some fun things, like add support for data: URLs to IE7 before the browser itself got around to supporting such URLs.

Understanding Custom Schemes

Sending a URL to into an APP object and getting bytes back from a stream is only half of the implementation challenge, however.

The other half is figuring out how the rest of the browser and web platform should handle these URLs. For Internet Explorer, we had a mechanism that allowed the host (browser) to query the protocol about how its URLs should be handled. The IInternetProtocolInfo interface allowed the APP’s code to handle the comparison and combination of URLs using its scheme, and allowed the code to answer questions about how web content returned from the URL should behave.

For instance, to fully support a scheme, the browser needs to be able to answer questions like:

  1. Is this scheme “standard” (allowing default canonicalization behaviors like removing \..\ sequences), or “opaque” (in which other components cannot safely touch the URL)?
  2. Is this scheme “Secure” (Allow in HTTPS pages without mixed content warnings, allow WebAPIs that require a secure context, etc)
  3. Does this scheme participate in CORS?
  4. Does this scheme get sent as a referrer?
  5. Is this scheme allowed from Sandboxed frames?
  6. Can top-level frames be navigated to this scheme?
  7. Can such navigations only occur from trusted contexts (app/omnibox) or is JavaScript allowed to invoke such navigations?
  8. How do navigations to these urls interact with all of the other WebNavigation/WebRequest extensibility APIs?
  9. How does the scheme interact with the sandbox? What process isolation is used?
  10. What origin is returned to web content running from the scheme?
  11. How does content from the scheme interact with the cookie store?
  12. How does it interact with CSP?
  13. How does it interact with WebStorage?

Implementing Protocols in Chromium

Unlike Internet Explorer, Chromium does not offer a mechanism for third-party extensibility of its protocols; the browser itself must have support for a new protocol compiled in. A subclassed URLLoaderFactory (e.g. about, blob) invokes the correct url_loader implementation that returns the data for the response.

Chromium doesn’t have an analogue for the IInternetProtocolInfo interface for protocol implementors to implement; instead, the scheme must be manually added to each of the per-behavior lists of schemes hardcoded into Chromium.

Debugging Compatibility in Edge

Background

By moving from our old codebase to Chromium, the Microsoft Edge team significantly modernized our codebase and improved our compatibility with websites. As we now share the vast majority of our web platform code with the market-leading browser, it’s rare to find websites that behave differently in Edge when compared to Chrome, Brave, Opera, Vivaldi, etc. Any time we find a behavioral delta, we are keen to dive in and understand why Edge behaves differently. Sometimes, our investigation will reveal a behavior gap caused by a known and deliberate difference (e.g. we add an Edg/ token to our User-Agent string), but in the most interesting cases, something unexpected is found.

Yesterday, I came across an interesting case. In this post, I’ll share both how I approached root causing the difference, and explore how it can be avoided.

The Customer’s Issue

The Microsoft Edge team works hard to ensure that Enterprises can move fearlessly to Edge, both from legacy IE/Spartan, and from other browsers.

One such Enterprise recently contacted us to report that one of their workflows did not work as expected in Edge 96, noting that the flow in question works fine in both Google Chrome and Apple Safari.

Common causes (e.g. a different User-Agent header, Tracking Prevention set to defaults) were quickly ruled out as root causes: you can easily run Edge with a different User-Agent header (via the --user-agent command line argument or via the Emulation feature in the F12 Developer Tools) and Tracking Prevention reports any blocked storage or network requests to the F12 Console.

The customer’s engineers noted that the problem seemed to be that Edge was failing to send a cross-origin fetch request– NetLogs of the flow revealed that the fetch request began, but never completed. As a result of failing to send the request, a WebAPI was not invoked, and the user was unable to change the state of the web application.

This was a super-interesting finding because Edge shares essentially all of its code in this area with upstream Chromium.

The customer shared their NetLogs with us, and I took a shot at analyzing the traffic captured within. Unfortunately, the results were inconclusive: it was easy to see that the request had indeed not been sent, but it was not clear why.

Here’s a snippet of the log for the attempt to call the API. We see that the request was a POST to an invokeAPI page on a different server, and because of the request’s Content-Type (application/json) the browser was required to perform a CORS preflight request before sending the POST to the remote server.

145: URL_REQUEST https://logic.azure.com/invokeAPI
t=9118 [st=0] +CORS_REQUEST  [dt=175]
  --> cors_preflight_policy = "consider_preflight"
  --> headers = "...content-type: application/json\r\n..."
  --> is_external_request = false
  --> is_revalidating = false
  --> method = "POST"
  --> url = https://logic.azure.com/invokeAPI
t=9118 [st=0]  CHECK_CORS_PREFLIGHT_REQUIRED
             --> preflight_required = true
             --> preflight_required_reason ="disallowed_header"
t=9118 [st=0]    CHECK_CORS_PREFLIGHT_CACHE
             --> status = "miss"
t=9118 [st=0]    CORS_PREFLIGHT_URL_REQUEST
             --> source_dependency = 146 (URL_REQUEST)
t=9293 [st=175] -CORS_REQUEST
t=9411 [st=293]  CORS_PREFLIGHT_RESULT
   --> access-control-allow-headers = "content-type"
   --> access-control-allow-methods = "GET,HEAD,OPTIONS,POST"

Crucially, we see that at 9293 milliseconds (175ms after the request process began), the request was aborted (-CORS_REQUEST). Not until 118 milliseconds later did the response to the preflight (CORS_PREFLIGHT_RESULT) come back saying, in effect, “Sure, I’m okay with accepting a request with that Content-Type and the POST method.” But it was too late– the request was already cancelled. The question is: “Why did the request get cancelled?”

A quick look at the webapp’s JavaScript didn’t show any explicit code that would cancel the fetch(). My hunch was that the request got cancelled because the fetch() call was in a frame that got navigated elsewhere before the request completed, but in this complicated application, network requests were constantly firing. That made it very difficult to prove or disprove my theory from looking at the Network Traffic alone. I tried looking at an edge://tracing log supplied by the customer, but the application was so complicated that I couldn’t make any sense of it.

And the larger question loomed: Even if teardown due to navigation is the culprit, why doesn’t this repro in Chrome?

The overall workflow involves several servers, including a SharePoint site, Microsoft Outlook Online, and custom logic running on Microsoft Azure. There didn’t seem to be any good way for us to emulate this workflow, but fortunately, the customer was willing to provide us with credentials to their test environment.

I started by confirming the repro in both Edge Stable and Canary, and confirming that the problem did not repro in either Chrome Stable or Canary. As this web application used a ServiceWorker, I also confirmed that unregistering the worker (using the Application tab of the F12 DevTools) did not impact the repro.

With this confirmation in hand, I then aimed to test my hunch that navigation was the root cause of the WebAPI fetch request not being sent.

A New Debugging Tool

One significant limitation of NetLogs is that, while the URL_REQUEST entries in the log have tons of information about the content and network behavior of the request, the NetLogs don’t have much context about why a given request was made, or where its result will be used by the web application.

Fortunately, Chromium’s rich extensibility model surfaces events for both Navigations and Network Requests, including (crucially) information about which browser tab and frame generated the event.

As a part of my recent NativeMessaging Debugger project, I built a simple sample extension which watches for navigations and sends information out to a log window. For this exercise, I updated it to also watch for WebRequests. With this new logging in place, I reproduced the scenario, and it was plain that the fetch() call to the WebAPI was being cancelled due to navigation:

14:26:40:7164 - {"event":"navigation","destination":https://customer/MyTasks.aspx,"tabId":60,"frameId":0,"parentFrameId":-1,"timeStamp":1642624000710.5059}

14:26:40:7194 - {"event":"webRequest","method":"POST","url":https://logic.azure.com/invokeAPI,"tabId":60,"frameId":0,"parentFrameId":-1,"timeStamp":1642624000716.684,"type":"xmlhttprequest"}

14:26:40:7254 - {"event":"webRequest","method":"GET","url":https://customer/MyTasks.aspx,"tabId":60,"frameId":0,"parentFrameId":-1,"timeStamp":1642624000722.692,"type":"main_frame"}

The first event shows a top-level navigation (tabId:60, frameId:0) begin to MyTasks.aspx, although the navigation’s request hasn’t hit the network yet.

Then, six milliseconds later, we see the WebAPI’s fetch() call begin.

Six milliseconds after that, we see the network request for the navigation get underway, effectively racing against the outstanding WebAPI call.

From our NetLog, we know the navigation wins– the WebAPI’s fetch() preflight alone took 293ms to return, and the overall fetch request was aborted after just 175ms, right after the navigation’s request got back a set of HTTP/200 response headers.

So, my hunch is confirmed: The API call was aborted because the script context in which it was requested got torn down.

I was feeling pretty smug about my psychic debugging success but for one niggling thought: Why was this only happening in Edge?

Maybe this is happening because Chrome has some experiment running that changes the behavior of CORS or the JavaScript scheduler? I checked chrome://version/?show-variations-cmd and searched for any likely experiment names. Nothing seemed very promising, although I tried using the --enable-features command line argument to turn on a few flags in Edge to see if they mattered. No dice — Edge still repro’d the issue. I tried performing the repro in my local version of Chromium, which doesn’t run with any server-controlled flags. No dice — Chromium didn’t repro the issue. I even tried running a bisect against Chromium to see if any older Chromium build has this problem. Nope, going all the way back to Chrome 60, the workflow ran just fine.

I wanted to try bisecting against Edge, but unfortunately this isn’t simple to do without being set up with an Edge development environment (I develop directly against upstream Chromium). However, I did have a standalone copy of Edge 91 handy, and the problem repro’d there as well, so at least this wasn’t a super-recent regression.

I don’t like unsolved mysteries, but by this point I at least knew how the customer could fix this on their side. Their code for submitting the form looked like this:

  t.prototype.Submit = function() {
   this.callAPI(this.email, t),
   alert("Success"),
   setTimeout(function() {console.log("Wait 1sec")},1000),
   location.href = n + "/MyTasks.aspx")
 };

 t.prototype.callAPI = function(e, t) {
  var n = JSON.stringify({data: e}), o = new Headers;
  o.append("Content-type", "application/json");
  var i = {body: n,headers: o};
  this.post("https://logic.azure.com/invokeAPI",i).then(function(e)
         {return console.log("API call complete"), e.json()})
 };

As you can see, the Submit() function calls the callAPI() function to send the POST request, then shows an alert saying “Success”. Script execution pauses while the alert dialog is showing. When the user clicks “OK” in the alert(), a setTimeout call queues a callback with a one second delay, then navigates the top-level page to MyTasks.aspx.

This looks like a mistake– the location.href= call was probably meant to be inside the setTimeout callback; otherwise, that callback probably will never run because the page will have been torn down by the navigation before the callback’s console.log statement runs.

But more broadly, this code seems fundamentally wrong: We’re telling the user that the operation yielded “Success”, but we never actually waited for the WebAPI call to succeed— we just fired it off and then immediately showed the “Success” popup. The alert() and location.href='...' lines should probably be inside the then block on the .post call, and there should probably be error handling code for the case where that post call returned an error for any reason.

I wrote up this suggestion and sent it off to the customer– this should resolve their issue in Edge, and it’s more logical and correct in every browser.

And yet… why is Edge different?

I went back to look at the event logs from my repro again. And I noticed something crucial: in Chrome, the wrBeforeRequest event signaling the start of the WebAPI call appeared before the alert dialog box, and the wrResponseHeaders event appeared while the alert box was up (3 seconds later). Only after I clicked OK on the dialog box (32 seconds after that) did the navigation event occur:

In contrast, in Edge, the alert dialog box appears before the wrBeforeRequest event– in fact, the wrBeforeRequest event isn’t even seen until after clicking OK to dismiss the dialog. At that point, the WebAPI and Navigation requests race, and the WebAPI request will almost always lose the race.

Ah ha! So we’re getting closer. Edge is failing because its fetch call is getting blocked until dismissal of the modal alert. That’s weird. Maybe it’s something about this site’s JavaScript? I tried creating a minimal repro where a fetch() would get kicked off asynchronously and then an alert() would immediately follow. In Chrome, the fetch ran as expected, while in Edge, the fetch blocked on the user clicking OK.

Now we’re cooking with gas. A working minimal repro dramatically increases the odds of solving any mystery, because it makes testing theories much simpler, and roping in other engineers much cheaper– instead of asking them to perform complicated repro steps that may take tens of minutes, you can say “Click this link. Push the button. See the bug. Weird, right?

There’s been quite a lot of recent discussion about the Chrome team’s concerns about the 3 Web Platform modal dialog primitives (alert(), prompt(), and confirm()) and how they interact with the JavaScript event loop. Furious webdevs complain “If it ain’t broke, don’t fix it” and the web platform folks retort: “Oh, it’s very much broke, and we need to fix it!”.

Perhaps there’s some related change upstream is experimenting with? I spent another fruitless hour looking into their experimental configuration and ours, and bothering Edge’s alert() owners to find out if perhaps we might have changed anything about our alert() implementation for accessibility reasons or the like. No dice.

I wrote up my latest findings and sent them off to my engineers. With a reduced repro in hand, our Dev Manager popped up the edge://tracing tool and had a look around. He replied back: I suspect it has something to do with an Edge-specific throttle. Specifically, I see a WebURLLoader::Context::Start hitting a mojom.SafeBrowsingUrlChecker call that I don’t see in Chrome.

For context: Chromium supports the notion of throttles which allow a developer to react to events as the user browses around. For instance, throttles for resource loaders allow you to block and modify requests, which browser developers use to perform reputation scans for SafeBrowsing or SmartScreen, for example. NavigationThrottles allow developers to redirect or cancel navigations, and so on.

Microsoft Edge uses a throttle to implement our Tracking Prevention feature.

Way back in December 2019, one of our engineers noted that a test case behaved differently in Edge vs. Chrome. That test case was a page with three XHRs: two asynchronous and one synchronous. In Chrome, the two async calls ran in parallel with the sync call, while in Edge, the two async calls were blocked upon completion of the one sync call. The root cause was related to the Tracking Prevention throttle needing to perform a cross-process communication to check whether the XHRs’ url targets were potentially tracking servers. Even though synchronous XHRs are evil and need to die, we still fixed the testcase by updating the throttle so that it doesn’t run on same-origin requests (because they cannot be, by definition, 3rd party trackers).

In this week’s repro, the fetch call is async, but alert() is not– it’s a synchronous API that blocks JavaScript. And the WebAPI is to a 3rd-party URL, so the 2019 fix doesn’t prevent the throttle from running.

One interesting finding during the 2019 investigation was that turning off Tracking Prevention in edge://settings does not actually disable the throttle– it simply changes the throttle to not block any requests. With this recollection, I used the command line argument to disable the throttle entirely:
msedge.exe --disable-features=msEnhancedTrackingPreventionEnabled

…and confirmed that the minimal test page now behaves identically in Edge vs. Chrome.

We’ve filed a bug against the throttle to figure out how to address this case, but the recommendation to the customer stands: the current logic popping the alert() box ought to change such that either there’s no alert(), or it only shows success after the fetch() call actually succeeds.

-Eric

Recognizing Edge Windows

Yesterday, we had a customer reach out to us for help on an issue they’d encountered while writing code to interact with Microsoft Edge windows. Their script enumerated every window in the system, looking for those with Microsoft Edge in the titlebar. They were surprised to discover that the script didn’t recognize any of their browser windows, despite the fact that they could plainly see the product’s name in several windows on the taskbar and ALT+Tab overlay.

Weird, right?

After investigating further, the customer realized that the Edge window titles contained a Zero Width Space (+200B) Unicode character immediately after the word Microsoft and before the regular space character preceding the word Edge.

What possible use could that have?” the customer wondered.

When I started looking into this, I assumed it was simply a mistake, whereby someone had accidentally copied the invisible space into the IDS_BROWSER_WINDOW_TITLE_FORMAT resource within Edge’s version of Chromium. After all, if regular whitespace is a menace, invisible whitespace is at least doubly-so.

However, when I saw the source code, I realized that the developer definitely put it there on purpose:

As you can see, the zero-width space is fully visible, HTML-encoded as the constant value &#8203.

Q: Why on earth would we do that?

A: For the same reason we do almost every wacky, weird, or inexplicable thing: Compatibility.

Investigation revealed that this character was added precisely to cause existing 3rd-party software not to recognize Microsoft Edge windows. It turns out that there’s a very popular touchpad driver that applies special scrolling behavior for the (now defunct) Microsoft Edge Legacy (Spartan) browser, and this code doesn’t behave properly in the new Chromium-based Microsoft Edge. The touchpad’s software wasn’t doing any additional validation of the window’s owning process executable name or similar to limit its scope. So, the only straightforward way to prevent it from breaking Edge was to apply this trick. We filed a bug to eventually remove the character after the touchpad’s code is fixed.

If you’re writing an AutoHotkey script or other code to try to interact with Edge’s windows based on their window title, you’ll need to account for this invisible space.

-Eric

Trim Your Whitespace

Leading and trailing whitespace are generally invisible. Humans are bad at dealing with things they can’t see.

If your system accepts textual codes, or any other human-generated or human-mediated input, you should trim whitespace, whether it’s leading, trailing, or inline (if not meaningful).

// Trim leading and trailing whitespace
$('inputCode').value = $('inputCode').value.trim();

It’s downright silly that web-first companies with market capitalizations in the $Billions have not yet figured out this simple trick for improving their applications. Instead, we end up with garbage error messages like this one:

Or this one, from the most valuable company in history:

Related: Browsers can do better here too. On paste into a length-limited control, we should probably trim leading whitespace first if needed to respect the limit.

Improve the world: Trim harmful whitespace!

-Eric

Debug Native Messaging

Prelude

Last month, an Enterprise customer reached out to report that a 3rd-party browser extension they use wasn’t working properly. Investigation of the extension revealed that the browser extension relied upon a NativeMessaging Host (NMH) companion that runs outside of the browser’s sandbox. In reviewing a Process Monitor log provided by the customer, the Support Engineer and I observed that the Native Host executable was unexpectedly exiting tens of minutes after it started. After that unexpected exit, the next time the in-browser extension tried to call it, the browser-to-native call failed, and the browser extension was unable to provide its intended functionality.

Unfortunately, I don’t have either the source (or even the binary) for the NMH executable, and there are no obvious clues in the Process Monitor log (e.g. a failed registry read or write) that reveal the underlying problem. I lamented to the Support Engineer that I really wished we could see the JSON messages being exchanged between the browser extension and the NMH to see if they might reveal the root cause.

We need, like, Fiddler, but for NMH messages instead of HTTPS messages.”

How Hard Could It Be?

Technically, I don’t really own anything related to browser extensions, so after ruling out what few possible problems I could imagine as root causes, I moved on to other tasks.

But that vision stuck with me throughout the day and the evening that followed: Fiddler, but for Native Messaging.

How hard could it be to build that? How useful would it be?

I haven’t written much C# code since leaving Fiddler and Telerik at the end of 2015, and the few exceptions (e.g. the NetLog Importer) have mostly been plugins to Fiddler rather than standalone applications. Still, Native Messaging is far less complicated than HTTPS, so it shouldn’t be too hard, right?

We want the following features in a debugger:

  1. Show messages from any Browser Extension to any Native Host
  2. Enable logging these messages to a file
  3. Allow injecting arbitrary messages in either direction
  4. (Stretch goal) Allow modification of messages

Over the following few evenings, I dusted off my Visual Studio IDE and struggled to remember how C# async programming works in modern times (Fiddler’s implementation was heavily threaded and mostly predated more modern alternatives).

Introducing the NativeMessaging Meddler

The source and (soon) compiled code for the NativeMessaging Meddler can be downloaded from GitHub.

The NativeMessaging Meddler (NMM) is a Windows application that requires .NET 4.8. A line of tabs across the bottom enables you to switch between tabs; by default, running the .exe directly just shows help text:

The NMM tool can respond to NativeMessages from a browser extension itself, or it can proxy messages between an existing extension and an existing NMH executable.

Configure the Demo

To test the basic functionality of the tool, you can install the Demo Extension.

  1. Visit about://extensions in Chrome or Edge
  2. Enable the Developer Mode toggle
  3. Push the Load Unpacked button
  4. Select the sample-ext folder
  5. A new “N” icon appears in the toolbar

After the demo extension is installed, you must now register the demo Native Host app. To do so, update its manifest to reflect where you placed it:

  1. Open the manifest.json file using Notepad or a similar editor
  2. Set the path field to the full path to the .exe. Be sure that every backslash is doubled up.
  3. Set the allowed_origins field to contain the ID value of the extension from the about:extensions page.

Next, update the registry so that the browser can find your Host:

  1. Edit the InstallRegKeys.reg file in Notepad, updating the file path to point to the location of the manifest.json file. Be sure that each backslash is doubled up.
  2. Double-click the InstallRegKeys.reg file to import it to the registry.

Run the Demo

With both the host and extension installed, you can now test out the tool. Click the “N” icon from the extension in the toolbar to navigate to its demo page. An instance of the NMM should automatically open.

Type Hello world! in the Outgoing Messages box and click Post Message to port. The message should appear on the Monitor tab inside the NMM app:

If you tick the Reflect to extension option at the top right and then send the message again, you should see the NMM tool receive the message and then send it back to the extension page, where it’s shown in the Incoming Messages section:

“Reflect to extension” copies inbound messages back to the sender

What if we want to inject a new message of our choosing from NMM?

Go to the Injector tab in NMM and type a simple JSON message in the bottom box. Then click the Send to Browser/Extension button. You’ll see the message appear inside the browser in the Incoming Messages section:

Note: Your message must be well-formed JSON, or it will never arrive.

At this point, we’ve now successfully used the NMM tool to receive and send messages from our Demo extension.

Proxying Messages

While our demo is nice for testing out Native Messaging, and it might help as a mock if we’re developing a new extension that uses Native Messaging, the point of this exercise is to spy on communications with an existing extension and host.

Let’s do that.

First, go to the Configure Hosts tab, which grovels the registry to find all of the currently-registered Native Hosts on your PC:

The plan is to eventually make intercepting any Native Host a point-and-click experience, but for now, we’re just using this tab to find the file system location of the Native Host we wish to intercept. If an entry appears multiple times, pick the instance with the lowest Priority score.

For example, say we’re interested in the BrowserCore Host which is used in some Windows-to-Web authentication scenarios in Chrome. We see the location of the manifest file, as well as the name of the EXE extracted from the manifest file:

In some cases, you might find that the Exe field shows ??? as in the vidyo entry above. This happens if the manifest file fails to parse as legal JSON. Chromium uses a bespoke JSON parser in lax mode for parsing manifests, and it permits JavaScript-style comments. The NMM tool uses a strict JSON parser and fails to parse those comments. It doesn’t really matter for our purposes.

Note the location of the manifest file and open it in your editor of choice. Note: If the file is in a privileged location, you may need to open your editor elevated (as Administrator).

Tip: You can Alt+DblClick an item or hit Alt+Enter with it selected to open Windows Explorer to the manifest’s location.

Within the manifest, change the path field by introducing the word .proxy before the .exe at the end of the filename:

Save the file.

Note: In some cases, not even an Administrator will be able to write the file by default. In such cases, you’ll need to use Administrator permissions to take ownership of the file to grant yourself permission to modify it:

There are other approaches that do not require changing filesystem permissions, but we won’t cover those here.

Next, copy the nmf-view.exe file into the folder containing the Native Host and rename it to the filename you wrote to the manifest:

At this point, you’ve successfully installed the NMM proxy. Whenever the browser extension next tries to launch the Native Host, it will instead activate our NMM debugger, which will in turn spawn the original Native Host (in this example, BrowserCore.exe) and proxy all messages between the two.

Now, visit a site where you can log in, like https://office.microsoft.com. Click the login button at the top-right and observe that our debugger spawns, collects a request from the Windows 10 Accounts extension, passes it to BrowserCore.exe, reads the Host’s reply, and passes that back to the extension. Our debugger allows us to read the full text of the JSON messages in both directions:

Note: This screenshot is redacted because it contains secret tokens.

Pretty neat, huh?

Tampering with Messages

When I got all of this working, I was excited. But I was also disappointed… plaintext rendering of JSON isn’t super readable, and building a UI to edit messages was going to be a ton of extra work. I lamented sheesh… I already wrote all of the code I want fifteen years ago for Fiddler. It has both JSON rendering and message editing… and I briefly bemoaned the fact that I no longer own Fiddler and can’t just copy the source over.

And then I had the epiphany. I don’t need to reimplement parts of Fiddler in NMM. The tools can simply work together! NMM can pass the messages it receives from the Browser Extension and Native Host up to Fiddler as they’re received, and if Fiddler modifies the message, NMM can substitute the modified message.

Eureka!

Configure Tampering

First, re-edit the manifest.json file to add a .fiddler component to the path, and rename the .proxy.exe file to .proxy.fiddler.exe, like so:

This new text signals that you want NMM to start with the Tamper using Fiddler option set. To debug “single-shot” Native Hosts like BrowserCore.exe, we can’t simply use the checkbox at the top-right of NMM’s Monitor tab, because the debugger and Native Host spawn and complete their transaction much faster than we puny humans can click the mouse. Note: You can also specify the string .log. to enable the option that writes the traffic log to your Desktop.

Now, start Fiddler, perhaps using the -noattach command line argument so that it does not register as the system proxy. Type bpu ToApp in the QuickExec box beneath the Web Sessions list and hit Enter.

This creates a request breakpoint which will fire for all requests whose urls contain the string ToApp, which NMM uses to record requests sent to the original Native Host:

Using Fiddler’s Inspectors, we can examine the JSON of the message using the JSON treeview, or the TextView or SyntaxView Inspectors:

If we are satisfied with the message, click the Run to Completion button, and our NMM app will send the original, unmodified message to the original Native Host. However, if we want to tamper with the message, instead pick a success response like 200_SimpleHTML.dat from the dropdown:

A template response will appear in the Response TextView:

Overwrite that template text with the modified text you’d like to use instead:

… then push the green Run to Completion button. Fiddler will return the modified text to the NMM proxy, and the NMM proxy will then pass that modified message to the original Native Host:

In this case, the original Native Host doesn’t know what to do with the GetFiddledCookies request and returns an error which is passed back to the browser.

Tip: If your goal is to instead tamper with messages sent from the Native Host to the extension, enter bpu ToExt in Fiddler’s QuickExec box. Alternatively, you can also use any of Fiddler’s richer tampering features, such that it breaks only on messages containing certain text, automatically rewrites certain messages, etc.

Happy Meddling!

-Eric

Lock down web browsing using Kiosk Mode

Browsers get used in many different environments. Today, I take a look at scenarios where there’s either no interactive user (digital signage) or a potentially malicious user (internet kiosks).

Digital Signage (fullscreen) Requirements

In the Digital Signage scenario, there’s a full-screen webpage rendering and there are no user-accessible input devices– the canonical example here would be an airport’s signage displaying arriving and departing flights and their associated gates.

Supporting this use-case is relatively easy– the browser must be full-screened, and it must avoid showing any sort of prompt, tip, hint, or feature that requires dismissal because there’s no guarantee that a mouse or keyboard is even plugged into the device.

In this scenario, the browser is typically used to load only a specific website, which itself must be carefully coded not to prompt the user for any input. Additionally, either the webapp must request a wakelock, or the OS must be configured to let the computer sleep or hibernate. Similarly, the OS must be configured not to prompt the user for input or show modal dialogs (OS update prompts, etc).

Kiosk (public-browsing) Requirements

While supporting digital signage is reasonably straightforward, providing a true internet kiosk is considerably harder. The set of potential customer requirements is much broader– some kiosk owners want to allow the user to browse anywhere and download any files, etc, while other kiosk owners want to tightly lock down the experience to a small number of supported web pages. Making matters far more complicated, in some kiosk scenarios we cannot assume that the user is well-intentioned– they might want to abuse their access or even hack the kiosk itself. Computers are relatively less protected against malicious local users.

Generally, an interactive kiosk aims to offer a few capabilities:

  • Allow the user to load one or more webpages, filling out forms or performing search queries
  • Offer most of the “digital signage” behaviors (e.g. avoid prompting the user with announcements, requesting that they explore new features, log into the browser itself)
  • Prevent the user from navigating to arbitrary sites
  • Prevent the user from tampering with loaded web app(s) using the Developer Tools
  • Prevent the user from exiting the browser or modifying its persistent state
  • Prevent the user from gaining access to the underlying OS to run other programs or modify persistent state

Of these, preventing access to underlying OS is the most critical, because if a malicious local user can execute commands in the OS, they can typically defeat all of the other restrictions intended for the kiosk.

Way back in my past life, I was the Security PM for Internet Explorer. At the 2008 Hack-in-the-Box security conference, my session on IE security improvements was preceded by a packed session wherein the presenter walked through two dozen popular “Kiosk browsing” software packages, breaking out of each to get access to the underlying system in under two minutes. Applause ranging from enthusiastic (for clever hacks) to bemused (for silly hacks) followed each attack.

Edge’s Kiosk Mode

Microsoft Edge offers a kiosk mode which can be simply activated by starting msedge.exe with the --kiosk command line argument. By default, this starts Edge with a full-screen InPrivate window with no address bar, no context menus, various hotkeys (like F12) disabled and so on. It’s a fine approach for something as simple as digital signage. But if you want to build a true kiosk, you’ll want to set some more options.

There’s a great documentation page on Configuring Edge Kiosk Mode that explains the various scenarios and configuration options. As explained on that page, one of the key things you’ll want to do is enable the Windows 10 “Assigned Access” feature so that Windows is locked down to limit the user to only the designated scenario.

You’ll likely also want to set a bunch of other Microsoft Edge policies to tighten things down.

Start with the Kiosk Mode Settings policies, then look at more general policies.

For instance, you almost certainly want to pass the --no-first-run command line argument or set the HideFirstRunExperience policy.

You probably want to use the URLBlocklist policy to block all URLs (e.g. a rule of *) and then use the URLAllowlist policy to exempt only those URLs patterns (e.g. https://example.com/app) that you wish to support. This helps prevent users from using the browser to browse the local file system (file:///c:/), from viewing web page source code (e.g. via CTRL+U), and from launching installed applications via App Protocols. Similarly, you may wish to restrict what a user can download, or configure downloaded files to be deleted on exit.

One very common vector for abusing kiosks is to use the File Picker dialog shown when the user hits Ctrl+O or pushes the Choose file button on a file upload control. The File Picker dialog is provided by Windows and by default exposes the ability to download URLs, navigate the local file system, and even launch files. This dialog can be blocked by disabling the AllowFileSelectionDialogs policy, with the obvious caveat that doing so will block any web app scenario that requires the user upload a file.

In some cases, you might want to prevent the user from using a Microsoft Edge hotkey that is not otherwise restricted. To implement such a restriction, you can use a Windows Keyboard Filter, with the caveat that the restriction will block the hotkey(s) across all of Windows.

Extreme Lockdown

In extreme cases, you might decide that you don’t want a browser at all. In such cases, building a simple Win32, .NET, or UWP app atop the Microsoft Edge WebView2 control might be your best bet, because you’ll have more complete control of the behavior of the application, with the Edge engine rendering your content under the hood.

-Eric