I wrote some blog posts back in my IEBlog days and they keep getting lost. So I’m linking them here. I’ll probably add some more new content here in the future.
I’ve written some more about CORS since then.
The Web Platform offers a great deal of power, and unfortunately evil websites go to great lengths to abuse it. One of the weakest (but simplest to implement) protections against such abuse is to block actions that were not preceded by a “User Gesture.” Such gestures (sometimes more precisely called User Activations) include a variety of simple actions, from clicking the mouse to typing a key; each interpreted as “The user tried to do something in this web content.”
A single user gesture can unlock any of a surprisingly wide array of privileged (“gated”) actions:
So, when you see a site show a UI like this:
…chances are good that what they’re really trying to do is trick you into performing a gesture (mouse click) so they can perform a privileged action– in this case, open a popup ad in a new tab.
Some gestures are considered “consumable”, meaning that a single user action allows only one privileged action; subsequent privileged actions require another gesture.
For the first few years of the web, developers pretty much coded whatever they thought was cool and shipped it. Specifications, if written at all, were an afterthought.
Then, for the next two decades, spec authors drafted increasingly elaborate specifications with optional features and extensibility points meant to be used to enable future work.
Unfortunately, browser and server developers often only implemented enough of the specs to ensure interoperability, and rarely tested that their code worked properly in the face of features and data allowed by the specs but not implemented in the popular clients.
Over the years, the web builders started to notice that specs’ extensibility points were rusting shut– if a new or upgraded client tried to make use of a new feature, or otherwise change what it sent as allowed by the specs, existing servers would fail when they saw the encountered the new values.
In light of this, spec authors came up with a clever idea: clients should send random dummy values allowed by the spec, causing spec-non-compliant servers that fail to properly ignore those values to fail immediately. This concept is called GREASE (with the backronym “Generate Random Extensions And Sustain Extensibility“), and was first implemented for the values sent by the TLS handshake. When connecting to servers, clients would claim to support new ciphersuites and handshake extensions, and intolerant servers would fail. Users would holler, and engineers could follow up with the broken site’s authors and developers about how to fix their code. To avoid “breaking the web” too broadly, GREASE is typically enabled experimentally at first, in Canary and Dev channels. Only after the scope of the breakages is better understood does the change get enabled for most users.
GREASE has proven such a success for TLS handshakes that the idea has started to appear in new places. Last week, the Chromium project turned on GREASE for HTTP2 in Canary/Dev for 50% of users, causing connection failures to many popular sites, including some run by Microsoft. These sites will need to be fixed in order to properly load in the new builds of Chromium.
// Enable "greasing" HTTP/2, that is, sending SETTINGS parameters with reserved identifiers and sending frames of reserved types, respectively. If greasing Frame types, an HTTP/2 frame with a reserved frame type will be sent after every HEADERS and SETTINGS frame. The same frame will be sent out on all connections to prevent the retry logic from hiding broken servers.
NETWORK_SWITCH(kHttp2GreaseSettings, "http2-grease-settings") NETWORK_SWITCH(kHttp2GreaseFrameType, "http2-grease-frame-type")
One interesting consequence of sending GREASE H2 Frames is that it requires moving the
END_STREAM flag (recorded as
fin=true in the netlog) from the
HTTP2_SESSION_SEND_HEADERS frame into an empty (
HTTP2_SESSION_SEND_DATA frame; unfortunately, the intervening GREASE Frame is not presently recorded in the netlog.
You can try H2 GREASE in Chrome Stable using command line flags that enable GREASE settings values and GREASE frames respectively:
chrome.exe --http2-grease-settings bing.com
chrome.exe --http2-grease-frame-type bing.com
Alternatively, you can disable the experiment in Dev/Canary:
GREASE is baked into the new HTTP3 protocol (Cloudflare does it by default) and the new UA Client Hints specification (where it’s blowing up a fair number of sites). I expect we’ll see GREASE-like mechanisms appearing in most new web specs where there are optional or extensible features.
Previously, I’ve described how to capture a network traffic log from Microsoft Edge, Google Chrome, and applications based on Chromium or Electron.
In this post, I aim to catalog some guidance for looking at these logs to help find the root cause of captured problems and otherwise make sense of the data collected.
Last Update: April 24, 2020 – I expect to update this post over time as I continue to gain experience in analyzing network logs.
After you’ve collected the net-export-log.json file using the about:net-export page in the browser, you’ll need to decide how to analyze it.
The NetLog file format consists of a JSON-encoded stream of event objects that are logged as interesting things happen in the network layer. At the start of the file there are dictionaries mapping integer IDs to symbolic constants, followed by event objects that make use of those IDs. As a consequence, it’s very rare that a human will be able to read anything interesting from a NetLog.json file using just a plaintext editor or even a JSON parser.
An alternative approach is to use the NetLog Importer for Telerik Fiddler.
For Windows users who are familiar with Fiddler, the NetLog Importer extension for Fiddler is easy-to-use and it enables you to quickly visualize HTTP/HTTPS requests and responses. The steps are easy:
In seconds, all of the HTTP/HTTPS traffic found in the capture will be presented for your review. If the log was compressed before it was sent to you, the importer will automatically extract the first JSON file from a chosen .ZIP or .GZ file, saving you a step.
In addition to the requests and responses parsed from the log, there are a number of pseudo-Sessions with a fake host of
NETLOG that represent metadata extracted from the log:
These pseudo-sessions include:
You can then use Fiddler’s UI to examine each of the Web Sessions.
The NetLog format currently does not store request body bytes, so those will always be missing (e.g. on POST requests).
Unless the Include Raw Bytes option was selected by the user collecting the capture, all of the response bytes will be missing as well. Fiddler will show a “dropped” notice when the body bytes are missing:
If the user did not select the Include Cookies and Credentials option, any Cookie or Authorization headers will be stripped down to help protect private data:
You can use Fiddler’s full text search feature to look for URLs of interest if the traffic capture includes raw bytes. Otherwise, you can search the Request URLs and headers alone.
On any session, you can use Fiddler’s “P” keystroke (or the Select > Parent Request context menu command) to attempt to walk back to the request’s creator (e.g. referring HTML page).
You can look for the
traffic_annotation value that reflects why a resource was requested by looking for the
X-Netlog-Traffic_Annotation Session Flag.
If Fiddler sees that cookies were not set or sent due to features like SameSiteByDefault cookies, it will make a note of that in the Session using a psuedo
$NETLOG-CookieNotSet header on the request or response:
While the Fiddler Importer is very convenient for analyzing many types of problems, for others, you need to go deeper and look at the raw events in the log using the Catapult Viewer.
Opening NetLogs with the Catapult NetLog Viewer is even simpler:
If you find yourself opening NetLogs routinely, you might consider using a shortcut to launch the Viewer in an “App Mode” browser instance:
The App Mode instance is a standalone window which doesn’t contain tabs or other UI:
Note that the Catapult Viewer is a standalone HTML application. If you like, you can save it as a .HTML file on your local computer and use it even when completely disconnected from the Internet. The only advantage to loading it from appspot.com is that the version hosted there is updated from time to time.
Along the left side of the window are tabs that offer different views of the data– most of the action takes place on the
If the problem only exists on one browser instance, check the
Command Line parameters and
Active Field trials sections on the
Import tab to see if there’s an experimental flag that may have caused the breakage. Similarly, check the
Modules tab to see if there are any browser extensions that might explain the problem.
Each URL Request has a
traffic_annotation value which is a hash you can look up in
annotations.xml. That annotation will help you find what part of Chromium generated the network request:
Most requests generated by web content will be generated by the
blink_resource_loader, navigations will have
navigation_url_loader and requests from features running in the browser process are likely to have other sources.
Look at the
DNS tab on the left, and
HOST_RESOLVER_IMPL_JOB entries in the
One interesting fact: The DNS Error page performs an asynchronous probe to see whether the configured DNS provider is working generally. The Error page also has automatic retry logic; you’ll see a duplicate
URL_REQUEST sent shortly after the failed one, with the
VALIDATE_CACHE load flag added to it. In this way, you might see a
DNS_PROBE_FINISHED_NXDOMAIN error magically disappear if the user’s router’s DNS flakes.
COOKIE_INCLUSION_STATUS events for details about each candidate cookie that was considered for sending or setting on a given URL Request. In particular, watch for cookies that were
excluded due to
SameSite or similar problems.
CERT_VERIFIER_JOB entries. Look at the raw TLS messages on the
Note: While NetLogs are great for capturing certs, you can also get the site’s certificate from the browser’s certificate error page.
The NetLog includes the certificates used for each HTTPS connection in the base64-encoded
You can just copy paste each certificate (including
END---) out to a text file, name it
log.cer, and use the OS certificate viewer to view it. Or you can use Fiddler’s Inspector, as noted above.
If you’ve got Chromium’s repo, you can instead use the script at
\src\net\tools\print_certificates.py to decode the certificates. There’s also a
cert_verify_tool in the Chromium source you might build and try. For Mac, using
verify-cert to check the cert and
dump-trust-settings to check the state of the Root Trust Store might be useful.
In some cases, running the certificate through an analyzer like https://crt.sh/lintcert can flag relevant problems.
HTTP_AUTH_CONTROLLER events, and responses with the status codes
For instance, you might find that the authentication fails with
ERR_INVALID_AUTH_CREDENTIALS unless you enable the browser’s
DisableAuthNegotiateCnameLookup policy (Kerberos has long been very tricky).
Got a great NetLog debugging tip I should include here? Please leave a comment and teach me!
Last update: June 18, 2020
I started building browser extensions more than 22 years ago, and I started building browsers directly just over 16 years ago. At this point, I think it’s fair to say that I’m entering the grizzled veteran phase of my career.
With the Edge team continuing to grow with bright young minds from college and industry, I’m increasingly often asked “Where do I learn about browsers?” and I haven’t had a ready answer for that question.
This post aims to answer it.
First, a few prerequisites for developing expertise in browsers:
Now, how do you apply these prerequisites and grow to become a master of browsers? Read on.
Over the years, a variety of broad resources have been developed that will give you a good foundation in the fundamentals of how browsers work. Taking advantage of these will help you more effectively explore and learn on your own.
If you prefer to learn from books, I can only recommend a few. Sadly, there are few on browsers themselves (largely because they tend to evolve too quickly), but there are good books on web technologies.
One of the best ways to examine what’s going on with browsers is to just use tools to watch what’s going on as you use your favorite websites.
The fact that all of the major browsers are built atop open-source projects is a wonderful thing. No longer do you need to be a reverse-engineering ninja with a low-level debugger to figure out how things are meant to work (although sometimes such approaches can still be super-valuable).
Source code locations:
While simply perusing a browser’s source code might give you a good feel for the project, browsers tend to be enormous. Chromium is over 10 million lines of code, for example.
If you need to find something in particular, one often effective way to find it easily is to search for a string shown in the browser UI near the feature of interest. (Or, if you’re searching for a DOM function name or HTML attribute name, try searching for that.) We might call this method string chasing.
By way of example, today I encountered an unexpected behavior in the handling of the “Go to <url>” command on Chromium’s context menu:
So, to find the code that implements this feature, I first try searching for that string:
…but there are a gazillion hits, which makes it hard to find what I need. So I instead search for a string that’s elsewhere in the context menu, and find only one hit in the Chromium “grd” (resources) file:
When I go look at that grd file, I quickly find the identifier I’m really looking for just below my search result:
So, we now know that we’re looking for usages of IDS_CONTENT_CONTEXT_GOTOURL, probably in a .CC file, and we find that almost immediately:
From here, we see that the menu item has the command identifier IDC_CONTENT_CONTEXT_GOTOURL, which we can then continue to chase down through the source until we find the code that handles the command. That command makes use of a variable selection_navigation_url_, which is filled elsewhere by some pretty complicated logic.
After you gain experience in the Chromium code, you might learn “Oh, yeah, all of the context menu stuff is easy to find, it’s in the renderer_context_menu directory” and limit your searches to that area, but after four years of working on Chrome, I still usually start my searches broadly.
If you’d actually like to compile the code of a major browser, things are a bit more involved, but if you follow the configuration instructions to the letter— your first build will succeed. Back in 2015, Monica Dinculescu created an amazing illustrated guide to contributing to Chromium.
You can compile Chromium or Firefox on a mid-range machine from 2016, but it will take quite a long time. A beefy PC will speed things up a bunch, but until we have cloud compilers available to the public, it’s always going to be pretty slow.
All browsers except Microsoft Edge have a public bug tracker where you can search for known issues and file new bugs if you encounter them.
I’ve doubtless forgotten some, see who I follow.
Public data reveals each point of marketshare in the browser market is worth at least $100,000,000 USD annually (most directly in the form of payments from the browser’s configured search engine).
Remembering this fact will help you understand many other things, from how browsers pay their large teams of expensive software engineers, to how they manage to give browsers away for free, to why certain features behave the way that they do.
Browsers are hugely complicated beasts, and tons of fun. If the resources above leave you feeling both overwhelmed and excited, maybe you should become a browser builder.
Want to change the world? Come join the new Microsoft Edge team today!
In recent posts, I’ve explored mechanisms to communicate from web content to local (native) apps, and I explained how web apps can use the HTML5 registerProtocolHandler API to allow launching them from either local apps or other websites.
In today’s post, we’ll explore how local apps can launch web apps in the browser.
In most cases, it’s trivial for an app to launch a web app and send data to it. The app simply invokes the operating system’s “launch” API and passes it the desired URL for the web app.
Any data to be communicated to the web app is passed in the URL query string or the fragment component of the URL.
On Windows, such an invocation might look like this:
ShellExecute(hwnd, "open", "https://bayden.com/echo.aspx?DataTo=Pass#GoesHere", 0, 0, SW_SHOW);
Calling this API results in the user’s default browser being opened and a new tab navigated to the target URL.
This same simple approach works great on most operating systems and with virtually any browser a user might have configured as their default.
Unfortunately, this well-lit path adjoins a complexity cliff— if your scenario has requirements beyond the basic [Launch the default browser to this URL], things get much more challenging. The problem is that there is no API contract that provides a richer feature set and works across different browsers.
For instance, consider the case where you’d like your app to direct the browser to POST a form to a target server. Today, popular operating systems have no such concept– they know how to open a browser by passing it a URL, but they expose no API that says “Open the User’s browser to the following URL, sending the navigation request via the HTTP POST method and containing the following POST body data.”
For instance, if the target webservice simply requires a HTTP POST and you cannot change it, your app could launch the browser to a webpage you control, passing the required data in the querystring component of a HTTP GET. Your web server could then reformat the data into the required POST body format and either proxy that request (server-side) to the target webservice, or it could return a web page with an auto-submitting form element with a method of POST and and action attribute pointed at the target webservice. The user’s browser will submit the form, posting the data to the target server.
Similarly, a more common approach involves having the app write a local HTML file in a temporary folder, then direct the Operating System to open that file using the appropriate API (again ShellExecute, in the case of Windows). Presuming that the user’s default HTML handler is also their default HTTPS protocol handler, opening the file will result in the default browser opening, and the HTML/script in the file will automatically submit the included form element to the target server. This “bounce through a local temporary form” approach has the advantage of making it possible to submit sizable of data to the server (e.g. the contents of a local file), unlike using a GET request’s size-limited querystring.
If your scenario requires uploading files, an alternative approach is to:
Back in the Windows 7 days, the IE8 team created a very cool feature called Accelerators that would allow users to invoke web services in their browser from any other application. Interestingly, the API contract supported web services that required POST requests.
Because there was no API in Windows that supported launching the default browser with anything other than a URL, a different approach was needed. A browser that wished to participate as a handler for accelerators could implement a IOpenServiceActivityOutputContext::Navigate function which was expected to launch the browser and pass the data. The example implementation provided by our documentation called into Internet Explorer’s Navigate2() COM API, which accepted as a parameter the POST body to be sent in the navigation. As far as I know, no other browser ever implemented IOpenServiceActivityOutputContext.
These days, Accelerators are long dead, and no one should be using Internet Explorer anymore. In the intervening years, no browser-agnostic mechanism to transfer a POST request from an app to a browser has been created.
Perhaps the closest we’ve come is the W3C’s WebDriver Standard, designed for automated testing of websites across arbitrary browsers. Unfortunately, at present, there’s still no way for mainstream apps to take a dependency on WebDriver to deliver a reliable browser-agnostic solution enabling rich transfers from a local app to a web app. Similarly, Puppeteer can be used for some web automation scenarios in Chrome or Edge, and the new Microsoft Playwright enables automated testing in Chromium, WebKit, and Firefox.
While the current picture is bleak, the future is a bit brighter. That’s because a major goal of browsers’ investment in Progressive Web Apps is to make them rich enough to take the place of native apps. Today’s native apps have very rich mechanisms for passing data and files to one another and PWAs will need such capabilities in order to achieve their goals.
Perhaps one day, not too far in the future, your OS and your browser (regardless of vendor) will better interoperate.
While I do most of my work in an office, from time to time I work on code changes to Chromium at home. With the recent deprecation of Jumbo Builds, building the browser on my cheap 2016-era Dell XPS 8900 (i7-6700K) went from unpleasant to impractical. While I pondered buying a high-end Threadripper, I couldn’t justify the high cost, especially given the limited performance characteristics for low-thread workloads (basically, everything other than compilation).
The introduction of the moderately-priced (nominally $750), 16 Core Ryzen 3950X hit the sweet spot, so I plunked down my credit card and got a new machine from a system builder. Disappointingly, it took almost two months to arrive in a working state, but things seem to be good now.
The AMD Ryzen 3950X has 16 cores with two threads each, and runs around 3.95ghz when they’re all fully-loaded; it’s cooled by a CyberPowerPC DeepCool Castle 360EX liquid cooler. An Intel Optane 905P 480GB system drive holds the OS, compilers, and Chromium code. The key advantage of the Optane over more affordable SSDs is that it has a much higher random read rate (~400% as fast as the Samsung 970 Pro I originally planned to use):
Following the Chromium build instructions, I configured my environment and set up a 32bit component build with reduced symbols:
is_component_build = true
enable_nacl = false
target_cpu = "x86"
blink_symbol_level = 0
symbol_level = 1
Atop Windows 10 1909, I disabled Windows Defender entirely, and didn’t do anything too taxing with the PC while the build was underway.
Ultimately, a clean build of the “chrome” target took just under 53 minutes, achieving 33.3x parallelism.
While this isn’t a fast result by any stretch of the definition, it’s still faster than my non-jumbo local build times back when I worked at Google in 2016/2017 and used a $6000 Xeon 48 thread workstation to build Chrome, at somewhere around half of the cost.
When I first joined Google, I learned about the seemingly magical engineering systems available to Googlers, quickly followed by the crushing revelation that most of those magic tools were not available to those of us working on the Chromium open-source project.
The one significant exception was that Google Chrome engineers had access to a distributed build system called “Goma” which would allow compiling Chrome using servers in the Google cloud. My queries around the team suggested that only a minority of engineers took advantage of it, partly because (at the time) it didn’t generate very debuggable Windows builds. Nevertheless, I eventually gave it a shot and found that it cut perhaps five minutes off my forty-five minute jumbo build times on my Xeon workstation. I rationalized this by concluding that the build must not be very parallelizable, and the fact that I worked remotely from Austin, so any build-artifacts from the Goma cloud would be much further away than from my colleagues in Mountain View.
Given the complexity of the configuration, I stopped using Goma, and spent perhaps half of my tenure on Chrome with forty-five minute build times. Then, one day I needed to do some development on my Macbook, and I figured its puny specs would benefit from Goma in a way my Xeon workstation never would. So I went back to read the Goma documentation and found a different reference than I saw originally. This one mentioned a then unknown to me “-j” command line argument that tells the build system how many cloud cores to use.
This new, better, documentation noted that by default the build system would just match your local core count, but when using Goma you should instead demand ~20x your local core count– so -j 960 for my workstation. With one command line argument, my typical compiles dropped from 45 minutes to around 6.
I returned to Microsoft as a Program Manager on the Edge team in mid-2018, unaware that replatforming atop Chromium was even a possibility until the day before I started. Just before I began, a lead sent me a 27 page PDF file containing the Edge-on-Chromium proposal. “What do you think?” he asked. I had a lot of thoughts (most of the form “OMG, yes!“) but one thing I told everyone who would listen is that we would never be able to keep up without having a cloud-compilation system akin to Goma. The Google team had recently open-sourced the Goma client, but hadn’t yet open-sourced the cloud server component. I figured the Edge team had engineering years worth of work ahead of us to replicate that piece.
When an engineer on the team announced two weeks later that he had “MSGoma” building Chromium using an Azure cloud backend, it was the first strong sign that this crazy bet could actually pay off.
And pay off it has. While I still build locally from time to time, I typically build Chromium using MSGoma from my late 2018 Lenovo X1 Extreme laptop, with build times hovering just over ten minutes. Cloud compilation is a game changer.
The Chrome team has since released a Goma Server implementation, and several other major Chromium contributors are using distributed build systems of their own design.
I haven’t yet tried using MSGoma from my new Ryzen workstation, but I’ve been told that the Optane drive is especially helpful when performing distributed builds, due to the high incidence of small random reads.
 This experience recalled a much earlier one: my family moving to Michigan shortly after I turned 11. Our new house featured a huge yard. My dad bought a self-propelled lawn mower and my brother and I took turns mowing the yard weekly. The self-propelled mower was perhaps fifteen pounds heavier than our last mower, and the self-propelling system didn’t really seem to do much of anything.
After two years of weekly mows from my brother and I, my dad took a turn mowing. He pushed the lawn mower perhaps five feet before he said “That isn’t right,” reached under the control panel and flipped a switch. My brother and I watched in amazement and dismay as the mower began pulling him across the yard.
Moral of the story: Knowledge is power.
Problems in accessing websites can often be found and fixed if the network traffic between the browser and the website is captured as the problem occurs. This short post explains how to capture such logs.
If someone asked you to read this post, chances are good that you were asked to capture a web traffic log to track down a bug in a website or your web browser.
Fortunately, in Google Chrome or the new Microsoft Edge (version 76+), capturing traffic is simple:
In some cases, especially when you dealing with a problem in logging into a website, you may need to set either the
Include cookies and credentials or
Include raw bytes options before you click the Start Logging button.
Note that there are important security & privacy implications to selecting these options– if you do so, your capture file will almost certainly contain private data that would allow a bad actor to steal your accounts or perform other malicious actions. Share the capture only with a person you trust and do not post it on the Internet in a public forum.
If you’re more of a visual learner, here’s a short video demonstrating the traffic capture process.
In a followup post, I explore how developers can analyze captured traffic.
In rare cases, you may need to capture network data early (e.g. to capture proxy script downloads and the like. To do that, close Edge, then run
msedge.exe --log-net-log=C:\some_path\some_file_name.json --net-log-capture-mode=IncludeSocketBytes
Note: This approach also works for Electron JS applications like Microsoft Teams:
I suspect that this is only going to capture the network traffic from the Chromium layer of Electron apps (e.g. web requests from the nodeJS side will not be captured) but it still may be very useful.
If you’ve built an application using the old Web Browser Control (mshtml, aka Internet Explorer), you might notice that by default it does not support HTTP/2. For instance, a trivial WebOC host loading Akamai’s HTTP2 test page:
When your program is running on any build of Windows 10, you can set a Feature Control Key with your process’ name to opt-in to using HTTP/2.
For applications running at the OS-native bitness, write the key here:
For 32-bit applications running on 64-bit Windows, write it here:
After you make this change and restart the application, it will use HTTP/2 if the server and network path supports it.
Using HTTP/2 should help improve performance and can even resolve functional bugs.
Update: Windows’ Internet Control Panel has a “HTTP2” checkbox, but it only controls web platform apps (IE and Legacy Edge), and unfortunately, the setting does not work properly for AppContainer/LowIL processes, which enable HTTP2 by default. This means that the checkbox, as of Windows 10 version 1909, is pretty much useless for its intended purpose (as only Intranet Zone sites outside of Protected Mode run at MediumIL).
A bug has been filed.
Update: Users of the new Chromium-based Edge browser can launch an instance with HTTP2 disabled using the
disable-http2 command line argument, e.g.
I’m not aware of a straightforward way to disable HTTP2 for the new Chromium-Edge-based WebView2 control, which has HTTP2 enabled by default.
While there are many different ways for servers to stream data to clients, the Server-sent Events / EventSource Interface is one of the simplest. Your code simply creates an EventSource and then subscribes to its onmessage callback:
Implementing the server side is almost as simple: your handler just prefaces each piece of data it wants to send to the client with the string data: and ends it with a double line-ending (\n\n). Easy peasy. You can see the API in action in this simple demo.
I’ve long been sad that we didn’t manage to get this API into Internet Explorer or the Legacy Edge browser. While many polyfills for the API exist, I was happy that we finally have EventSource in the new Edge.
Alas, I wouldn’t be writing this post if I hadn’t learned something new yesterday.
Last week, a customer reached out to complain that the new Edge and Chrome didn’t work well with their webmail application. After they used the webmail site for a some indeterminate amount time, they noticed that its performance slowed to a crawl– switching between messages would take tens of seconds or longer, and the problem reproduced regardless of the speed of the network. The only way to reliably resolve the problem was to either close the tabs they’d opened from the main app (e.g. the individual email messages could be opened in their own tabs) or to restart the browser entirely.
As the networking PM, I was called in to figure out what was going wrong over video conference. I instructed the user to open the F12 Developer Tools and we looked at the network console together. Each time the user clicked on a message, new requests were created and sat in the (pending) state for a long time, meaning that the requests were getting queued and weren’t even going to the network promptly.
But why? Diagnosing this remotely wasn’t going to be trivial, so I had the user generate a Network Export log that I could examine later.
In examining the log using the online viewer, the problem became immediately clear. On the Sockets tab, the webmail’s server showed 19 requests in the Pending state, and 6 Active connections to the server, none of which were idle. The fact that there were six connections strongly suggested that the server was using HTTP/1.1 rather than HTTP/2, and a quick look at the HTTP/2 tab confirmed it. Looking at the Events tab, we see five outstanding URLRequests to a URL that strongly suggests that it’s being used as an EventSource:
Each of these sockets is in the READING_RESPONSE state, and each has returned just ten bytes of body data to each EventSource. The web application is using one EventSource instance of the app, and the user has five tabs open to the app.
And now everything falls into place. Browsers limit themselves to 6 concurrent connections per server. When the server supports HTTP/2, browsers typically need just one connection because HTTP/2 supports multiplexing many (typically 100) streams onto a single connection. HTTP/1.1 doesn’t afford that luxury, so every long-lived connection used by a page decrements the available connections by one. So, for this user, all of their network traffic was going down a single HTTP/1.1 connection, and because HTTP/1.1 doesn’t allow multiplexing, it means that every action in the UI was blocked on a very narrow head-of-line-blocking pipe.
Looking in the Chrome bug tracker, we find this core problem (“SSE connections can starve other requests”) resolved “By Design” six years ago.
Now, I’m always skeptical when reading old bugs, because many issues are fixed over time, and it’s often the case that an old resolution is no longer accurate in the modern world. So I built a simple repro script for Meddler. The script returns one of four responses:
And sure enough, when we load the page we see that only six frames are getting events from the EventSource, and the images that are supposed to load at the bottom of the frames never load at all:
Similarly, if we attempt to load the page in another tab, we find that it doesn’t even load, with a status message of “Waiting for available socket…”
The web app owners should definitely enable HTTP/2 on their server, which will make this problem disappear for almost all of their users.
However, even HTTP/2 is not a panacea, because the user might be behind a “break-and-inspect” proxy that downgrades connections to HTTP/1.1, or the browser might conceivably limit parallel requests on HTTP/2 connections for slow networks. As noted in the By Design issue, a server depending on EventSource in multiple tabs might use a BroadcastChannel or a SharedWorker to share a single EventSource connection with all of the tabs of the web application.
Alternatively, swapping an EventSource architecture with one based on WebSocket (even one that exposes itself as a EventSource polyfill) will also likely resolve the problem. That’s because, even if the client or server doesn’t support routing WebSockets over HTTP/2, the WebSockets-Per-Host limit is 255 in Chromium and 200 in Firefox.
Stay responsive out there!