browsers, dev

Cookies and Concurrency, Redux

In yesterday’s episode, I shared the root cause of a bug that can cause document.cookie to incorrectly return an empty string if the cookie is over 1kb and the cookie grows in the middle of a DOM document.cookie getter operation.

Unfortunately, that simple bug wasn’t the root cause of the compatibility problem that I was investigating when my code-review uncovered it. The observed compatibility bug was slightly different– in the repro case, only one of the document’s cookies goes missing, and it goes missing even when only one page is setting the cookie.

After the brain-melting exercise of annotating the site’s minified framework libraries (console.log(‘…’) ftw!) via Fiddler’s AutoResponder, I found that the site uses the document.cookie API to save the same cookie (named “ld“) three times in a row, adding some information to the cookie each time. However, the ld cookie mysteriously disappears between 0.4 and 6 milliseconds after it gets set the third time. I painstakingly verified that the cookie wasn’t getting manipulated from any other context when it disappeared.

Hmm…

As I wrote up the investigation notes, I idly noted that due to a trivial typo in the website’s source code, the ld cookie was set first as a Persistent cookie, then (accidentally) as a Session cookie, then as a Persistent cookie.

In re-reading the notes an hour later, again my memory got tickled. Hadn’t I seen something like this before?

Indeed, I had. Just about five years ago, a user reported a similar bug where a HTTP response contained two Set-Cookie calls for the same cookie name and Internet Explorer didn’t store either cookie. I built a reduced test case and reported it to the engineering team.

Pushing Cookies

The root cause of the cookie disappearance relates to the Internet Explorer and Edge “loosely-coupled architecture.”

In IE and Edge, each browser tab process runs its own networking stack, in-process1. For persistent cookies, this poses no problem, because every browser process hits the same WinINET cookie storage area and gets back the latest value of the persistent cookie. In contrast, for session cookies, there’s a challenge. Session cookies are stored in local (per-process) variables in the networking code, but a browser session may include multiple tab processes. A Session cookie set in a tab process needs to be available in all other tab processes in that browser session.

As a consequence, when a tab writes a Session cookie, Edge must send an interprocess communication (IPC) message to every other process in the browser session, telling each to update its internal variables with the new value of the Session cookie. This Cookie Pushing IPC is asynchronous, and if the named cookie were later modified in a process before the IPC announcing the earlier update to the cookie is received, that later update is obliterated.

The Duplicate Set-Cookie header version of this bug got fixed in the Fall 2017 Update (RS3) to Windows 10 and thus my old Set-Cookie test case case no longer reproduces the problem.

Unfortunately, it turns out that the RS3 fix only corrected the behavior of the network stack when it encounters this pattern– if the cookie-setting calls are made via document.cookie, the problem reappears, as in this document.cookie test case.

BadBehavior

Playing with the repro page, you’ll notice that manually pushing “Set HOT as a Session cookie” or “Set as a Persistent cookie” works fine, because your puny human reflexes aren’t faster than the cookie-pushing IPC. But when you push the “Set twice” button that sets the cookie twice in fast succession, the HOT cookie disappears in Edge (and in IE11, if you have more than one tab open).

Until this bug is fixed, avoid using document.cookie to change a persistent cookie to a session cookie.

-Eric

In contrast, in Chrome, all networking occurs in the browser process (or a networking-only process), and if a tab process wants to get the current document.cookie, it must perform an IPC to ask the browser process for the cookie value. We call this “cookie pulling.”

Standard
browsers, tech

File the Bug

Two experiences this week reminded me of a very important principle for improving the quality of software… if you see something, say something. And the best way to do that is to file a bug.

Something Weird? File a bug!

The first case was last Thursday, when a user filed a bug in Chrome’s tracker noting that Chrome’s window border icons often got “stuck” in a hover state after being moused over. It was a clear, simple bug report and it was easily reproduced. I’ve probably hit this a hundred times over the years and didn’t think much of it… “probably some weird thing in my graphics card or some utility I’m running.” It never occurred to me that everybody else might be seeing this, or that it was exhibited only by Chrome.

Fortunately, the bug report showed that this issue was something others were hitting too, so I took a look. The problem proved to be almost unique to Chrome (not occurring in other Windows applications), and has existed for at least seven years, reproducing on every version from Windows Vista to Windows 10.

A scan of the bug tracker suggests that Thursday’s report was the first time in those seven years that this bug was filed; less than a week later, the simple fix is checked in and on the way to Chrome 54. Obviously, this is only a minor cosmetic issue, but we want our browser looking good at all times!

Animation of the fixed bug

Another cool aspect of this fix is that it will fix other applications too… the Opera and Vivaldi browsers are based on Chromium open-source roots and inherited this problem; they’ll probably pick up this fix shortly too.

th;df – Say Something Anyway

Even if you don’t file a bug, you should still say something. Recently, Ana Tudor noted on Twitter that her system was in a state after restart where neither Chrome nor Brave could render web content; both browsers showed the “crashed tab” experience, even after restarting and reinstalling the browsers. Running with the no-sandbox flag worked, and rebooting the system fully solved the problem. Her report sounded suspiciously similar to a problem I’d encountered back in April; fortunately, I’d filed a bug.

At the time, that bug was deemed unreproducible and I’d dismissed it as some wonkiness on my specific system, but Ana’s complaint brought this back to my attention. She’d also added another piece of data I didn’t have in my original report—the problem also occurred in Brave, but not Firefox or IE.

Even more fortunately, I hit this problem again after a system reboot yesterday, and because of Ana’s report, I was no longer convinced that this bug was some weird quirk on just my system. Playing with the repro, I found that neither Opera nor Vivaldi reproduced the problem; both of those browsers are architecturally similar to Brave, but importantly, both are 32-bit. So this was a great clue that the problem was specific to 64-bit. And I confirmed this, finding that the bug repro’d only in 64-bit Chrome Canary but not in 32-bit Canary. Now we’re cooking with gas!

I built Chromium and ran it through WinDBG, seeing that when the sandboxed content renderer process was starting up, it was hitting three debug breakpoints before dying. The breakpoints were in sandbox::InterceptionAgent::OnDllLoad, a function Chrome uses to thunk certain Windows APIs to inject security filters. At this point, and with a reliable repro in hand, my smarter colleague took over and quickly found that the code to allocate memory for the thunk was failing, due to some logic bugs. Thunks must be located at a particular place in memory – within 2gb of the thunked function – and the code to place our thunks was failing when ASLR randomly loaded the kernel32, gdi32, and user32 DLLs at the very top of the address space, leaving no room for our thunks. When the allocation failed, Chrome refused to allow the DLL to be loaded into the sandbox, and the renderer necessarily died. After the user rebooted the system again, ASLR again moved the DLLs to some other location and (usually) this location gives us room to place our thunks. With 20/20 hindsight, the root cause of this bug (and the upcoming fix) are obvious.

But we only knew to look for the problem because Ana took the time to say something.

Final Thoughts

  • Browser telemetry is great—we catch crashes and all sorts of problems with it. But debugging via telemetry can be really challenging— more akin to solving a mystery than following a checklist. For instance, in the case of the sandbox bug, the fact that the problem reproduced in Brave was a huge clue, and not something we’d ever know from telemetry.
  • Well-run projects love bug reports. Back when I was building Fiddler, a lot of users I talked to said things like: “Well, it’s free and pretty good so I didn’t want to bother you with a complaint about some bug.” This is exactly backwards. For most of Fiddler’s lifetime, bug reports from the community were the only compensation I received from making the tool available to everyone for free. Getting bug reports meant I could improve the product without having to pay for test machines and devices, hire a test organization, etc, etc. When I eventually sold Fiddler to Telerik, a large part of the value they were buying was the knowledge that the tool had been battle-tested by millions of users over 9 years and that I’d fixed thousands of bugs from that community.
  • Filing bugs is generally easy, and it’s especially easy for Chrome.
    • First, simply search for an existing bug on crbug.com
    • If you find it’s a known issue, star it so you get updates
    • If it’s not a known issue, click the New Issue button at the top-left
    • Tell us as much as you can about the problem. Try to put yourself in the reader’s shoes—we don’t know much about your system or scenario, so the more details you can provide, the better.
  • Screenshots and URLs that reproduce problems are invaluable.
  • Find a bug in another browser? Report it!

 

Thanks for your help in improving our digital world!

 

-Eric

Standard
browsers

Microsoft Edge Bugs and Omissions

I tweet about the new Microsoft Edge browser quite a lot. I wanted to have a blog post to collect some of the feedback I’ve provided so I have it in one place and can update as needed.

Note: This post mostly focuses on the bad parts of Edge; there are plenty of good parts, including much improved standards support and a safer default security posture.

Last Update: November 2015 Update Most of the trivial issues are fixed; the bigger problems are mostly unfixed

Bugs

1. The “Should I trust this site” link in the HTTPS trust badge goes to page that doesn’t even attempt to answer that question. Update: Sorta fixed.

2. The hover “tooltip” on that site doesn’t do escaping of & properly and also has a text-truncation bug:

image Update: Fixed.

3. The RichText tests at www.browserscope.org hang the browser.

4. When Windows UAC is set to “Don’t dim my desktop”, launching a download (e.g. setup.exe) that requires elevation causes the consent window to appear behind the Edge window, effectively causing a denial-of-service condition that hangs the tab.

5. No, not that star, the other one!

image

6. Remember focus rectangles that show which button is active? Yeah, I miss those.

7. Adding a folder silently fails if the name chosen contains any “special filesystem characters” like ?, :, *, etc.

image

8. HTML5 Drag/Drop — You can’t drag/drop files into the browser (e.g. on OneDrive.com). Update: Fixed.

9. Microsoft Edge fixed the longstanding (and amusing, due to its root cause) bug whereby it exported HTTP Archive (HAR) files as XML instead of JSON. Unfortunately, the new JSON exporter omits the required encoding=”base64″ attribute when including binary bodies. Also unfortunate, F12 doesn’t write the creator version field in the JSON; a proper version number here would allow tools like Fiddler to better accommodate the buggy output.

10. CSS Animations that have been offloaded to the GPU (“independent animations”) cannot be stopped. The only workaround is to prevent them from being independently animated.

Omitted

1. Windows 7 Support – After strongly hinting that IE11’s successor would run on Windows 7, the team changed course and said that Edge wouldn’t appear on Windows 7 at release but they’d promise to “watch customer demand” for a Windows 7 version. From both mind-share and market-share perspectives, I think this is a very risky move.

2. Extensions – Edge was expected to contain a new Chrome-like extension model, but this slipped from the original release. There’s currently no ETA for its arrival. Update: Delayed to 2016.

3. Tracking Protection Lists –  A Tweet from an IE engineer implies that these will not be coming back to Edge and the future extension model is expected to serve as a replacement. This is unfortunate, as a good TPL dramatically improves the speed at which pages load and significantly reduces the number of pages that can cause the browser to hang or crash.

4. AddSearchProvider – Edge makes it quite cumbersome to add search providers, having removed the AddSearchProvider API supported by IE7-IE11, Chrome and Firefox.

5. Click-to-Play – There’s no way to configure the built-in Flash object to operate in a “click-to-play” manner.

6. Report Phishing – The old “SmartScreen > Report this Site” experience has been removed and replaced with a “Feedback and Reporting” widget that accepts all sorts of feedback about both the browser and the site. It is likely that this experience does not collect the same level of data as the old experience, meaning that some reported phish may escape.

7. Menus & Chords – When Microsoft Office dumped the menus in favor of the ribbon system, they ensured that the old accelerator keys and keyboard chords (e.g. Alt+F,C to “Close tab”) continued to work. Edge makes no such attempt, and thus my muscle memory built up for over a decade now fails.

8. JavaScript Uncontrollable — Unlike nearly every browser, Edge offers no way to disable JavaScript on a per-site or global basis, even to test <noscript> tags.

9. Certificate Inspection — There’s no way to inspect the certificate presented by a HTTPS site.

Bonus Gripes: Windows 10 Issues

1. At 125% Zoom, the Window Title bar is one pixel too short. (Fixed in August)

Embedded image permalink

2. There’s no visual distinction between the title bar and the menu bar in some apps (like Notepad). As a consequence there’s no way to tell whether click & drag will drag the window or do nothing at all.

3. A background licensing service frequently crashes when resuming from sleep; it takes down the WiFi service which runs in the same service host which means you can’t access WiFi after resume. Update: fixed by the July 20th update.

4. The experience for making applications default has changed again in Windows 10. While the Windows 8/8.1 experience wasn’t awesome, the Windows 10 experience is a slap in the face to the user. Mozilla is complaining, justifiably.

5. Win10/.NET4.6 carries over the Shell/.NET bug whereby double-clicking any label control copies its text to the clipboard. The behavior change in the comctl32 label control was checked in during Windows Vista by a rogue dev without a spec or an explanation.

6. Windows 10 carries over the Windows 8 regression whereby proxy-change calls are ignored during shutdown.

-Eric

Standard