http/2

For the first few years of the web, developers pretty much coded whatever they thought was cool and shipped it. Specifications, if written at all, were an after thought.

Then, for the next two decades, spec authors drafted increasingly elaborate specifications with optional features and extensibility points meant to be used to enable future work.

Unfortunately, browser and server developers often only implemented enough of the specs to ensure interoperability, and rarely tested that their code worked properly in the face of features and data allowed by the specs but not implemented in the popular clients.

Over the years, the web builders started to notice that specs’ extensibility points were rusting shut– if a new or upgraded client tried to make use of a new feature, or otherwise change what it sent as allowed by the specs, existing servers would fail when they saw the encountered the new values.

In light of this, spec authors came up with a clever idea: clients should send random dummy values allowed by the spec, causing spec-non-compliant servers that failed to properly ignore those values to fail immediately. This concept is called GREASE (with the backronym “Generate Random Extensions And Sustain Extensibility“), and was first implemented for the values sent by the TLS handshake. When connecting to servers, clients would claim to support new ciphersuites and handshake extensions, and intolerant servers would fail. Users would holler, and engineers could follow up with the broken site’s authors and developers about how to fix their code. To avoid “breaking the web” too broadly, GREASE is typically enabled experimentally at first, in Canary and Dev channels. Only after the scope of the breakages is better understood does the change get enabled for most users.

GREASE has proven such a success for TLS handshakes that the idea has started to appear in new places. Last week, the Chromium project turned on GREASE for HTTP2 in Canary/Dev for 50% of users, causing connection failures to many popular sites, including some run by Microsoft. These sites will need to be fixed in order to properly load in the new builds of Chromium.

// Enable "greasing" HTTP/2, that is, sending SETTINGS parameters with reserved identifiers and sending frames of reserved types, respectively. If greasing Frame types, an HTTP/2 frame with a reserved frame type will be sent after every HEADERS and SETTINGS frame. The same frame will be sent out on all connections to prevent the retry logic from hiding broken servers.
NETWORK_SWITCH(kHttp2GreaseSettings, "http2-grease-settings") NETWORK_SWITCH(kHttp2GreaseFrameType, "http2-grease-frame-type")

One interesting consequence of sending GREASE Frames is that it requires moving the END_STREAM flag (recorded as fin=true in the netlog) from the HTTP2_SESSION_SEND_HEADERS frame into an empty (size=0) HTTP2_SESSION_SEND_DATA frame; unfortunately, the intervening GREASE Frame is not presently recorded in the netlog.

You can try H2 GREASE in Chrome Stable using command line flags that enable GREASE settings values and GREASE frames respectively:

chrome.exe --http2-grease-settings bing.com
chrome.exe --http2-grease-frame-type bing.com

Alternatively, you can disable the experiment in Dev/Canary:

chrome.exe --disable-features=Http2Grease

GREASE is baked into the new HTTP3 protocol (Cloudflare does it by default) and the new UA Client Hints specification. I expect we’ll see GREASE-like mechanisms appearing in most new web specs where there are optional or extensible features.

-Eric

While there are many different ways for servers to stream data to clients, the Server-sent Events / EventSource Interface is one of the simplest. Your code simply creates an EventSource and then subscribes to its onmessage callback:

Implementing the server side is almost as simple: your handler just prefaces each piece of data it wants to send to the client with the string data: and ends it with a double line-ending (\n\n). Easy peasy. You can see the API in action in this simple demo.

I’ve long been sad that we didn’t manage to get this API into Internet Explorer or the Legacy Edge browser. While many polyfills for the API exist, I was happy that we finally have EventSource in the new Edge.

Yay! \o/

Alas, I wouldn’t be writing this post if I hadn’t learned something new yesterday.

Last week, a customer reached out to complain that the new Edge and Chrome didn’t work well with their webmail application. After they used the webmail site for a some indeterminate amount time, they noticed that its performance slowed to a crawl– switching between messages would take tens of seconds or longer, and the problem reproduced regardless of the speed of the network. The only way to reliably resolve the problem was to either close the tabs they’d opened from the main app (e.g. the individual email messages could be opened in their own tabs) or to restart the browser entirely.

As the networking PM, I was called in to figure out what was going wrong over video conference. I instructed the user to open the F12 Developer Tools and we looked at the network console together. Each time the user clicked on a message, new requests were created and sat in the (pending) state for a long time, meaning that the requests were getting queued and weren’t even going to the network promptly.

But why? Diagnosing this remotely wasn’t going to be trivial, so I had the user generate a Network Export log that I could examine later.

In examining the log using the online viewer, the problem became immediately clear. On the Sockets tab, the webmail’s server showed 19 requests in the Pending state, and 6 Active connections to the server, none of which were idle. The fact that there were six connections strongly suggested that the server was using HTTP/1.1 rather than HTTP/2, and a quick look at the HTTP/2 tab confirmed it. Looking at the Events tab, we see five outstanding URLRequests to a URL that strongly suggests that it’s being used as an EventSource:

Each of these sockets is in the READING_RESPONSE state, and each has returned just ten bytes of body data to each EventSource. The web application is using one EventSource instance of the app, and the user has five tabs open to the app.

And now everything falls into place. Browsers limit themselves to 6 concurrent connections per server. When the server supports HTTP/2, browsers typically need just one connection because HTTP/2 supports multiplexing many (typically 100) streams onto a single connection. HTTP/1.1 doesn’t afford that luxury, so every long-lived connection used by a page decrements the available connections by one. So, for this user, all of their network traffic was going down a single HTTP/1.1 connection, and because HTTP/1.1 doesn’t allow multiplexing, it means that every action in the UI was blocked on a very narrow head-of-line-blocking pipe.

Looking in the Chrome bug tracker, we find this core problem (“SSE connections can starve other requests”) resolved “By Design” six years ago.

Now, I’m always skeptical when reading old bugs, because many issues are fixed over time, and it’s often the case that an old resolution is no longer accurate in the modern world. So I built a simple repro script for Meddler. The script returns one of four responses:

  • An HTML page that consumes an EventSource
  • An HTML page containing 15 frames pointed at the previous HTML page
  • An event source endpoint (text/event-stream)
  • A JPEG file (to test whether connection limits apply across both EventSources and other downloads)

And sure enough, when we load the page we see that only six frames are getting events from the EventSource, and the images that are supposed to load at the bottom of the frames never load at all:

Similarly, if we attempt to load the page in another tab, we find that it doesn’t even load, with a status message of “Waiting for available socket…”

The web app owners should definitely enable HTTP/2 on their server, which will make this problem disappear for almost all of their users.

However, even HTTP/2 is not a panacea, because the user might be behind a “break-and-inspect” proxy that downgrades connections to HTTP/1.1, or the browser might conceivably limit parallel requests on HTTP/2 connections for slow networks. As noted in the By Design issue, a server depending on EventSource in multiple tabs might use a BroadcastChannel or a SharedWorker to share a single EventSource connection with all of the tabs of the web application.

Alternatively, swapping an EventSource architecture with one based on WebSocket (even one that exposes itself as a EventSource polyfill) will also likely resolve the problem. That’s because, even if the client or server doesn’t support routing WebSockets over HTTP/2, the WebSockets-Per-Host limit is 255 in Chromium and 200 in Firefox.

Stay responsive out there!

-Eric