debugging

I’ve written about Browser Proxy Configuration a few times over the years, and I’m delighted that Chromium has accurate & up-to-date documentation for its proxy support.

One thing I’d like to call out is that Microsoft Edge’s new Chromium foundation introduces a convenient new debugging feature for debugging the behavior of Proxy AutoConfiguration (PAC) scripts.

To use it, simply add alert() calls to your PAC script, like so:

alert("!!!!!!!!! PAC script start parse !!!!!!!!");
function FindProxyForURL(url, host) {
alert("Got request for (" + url+ " with host: " + host + ")");
return "PROXY 127.0.0.1:8888";
}
alert("!!!!!!!!! PAC script done parse !!!!!!!!");

Then, collect a NetLog trace from the browser:

msedge.exe --log-net-log=C:\temp\logFull.json --net-log-capture-mode=IncludeSocketBytes

…and reproduce the problem.

Save the NetLog JSON file and reload it into the NetLog viewer. Search in the Events tab for PAC_JAVASCRIPT_ALERT events:

Even without adding new alert() calls, you can also look for HTTP_STREAM_JOB_CONTROLLER_PROXY_SERVER_RESOLVED events to see what proxy the proxy resolution process determined should be used.

One current limitation of the current logging is that if the V8 Proxy Resolver process…

… crashes (e.g. because Citrix injected a DLL into it), there’s no mention of that crash in the NetLog; it will just show DIRECT. Until the logging is enhanced, users can hit SHIFT+ESC to launch the browser’s task manager and check to see whether the utility process is alive.

Try using the System Resolver

In some cases (e.g. when using DirectAccess), you might want to try using Windows’ proxy resolution code rather than the code within Chromium.

The --winhttp-proxy-resolver command line argument will direct Chrome/Edge to call out to Windows’ WinHTTP Proxy Service for PAC processing.

Differences in PAC Processing

  • Internet Explorer/WinINET/Edge Legacy call the PAC script’s FindProxyForURLEx function (introduced to unlock IPv6 support), if present, and FindProxyForURL if not.
  • Chrome/Edge/Firefox only call the FindProxyForURL function and do not call the Ex version.
  • Internet Explorer/WinINET/Edge Legacy expose a getClientVersion API that is not defined in other PAC environments.

Notes for Other Browsers

  • Prior to Windows 8, IE showed PAC alert() notices in a modal dialog box. It no longer does so and alert() is a no-op.
  • Firefox shows alert() messages in the Browser Console (hit Ctrl+Shift+J); note that Firefox’s Browser Console is not the Web Console where web pages’ console.log statements are shown.

-Eric

Problems in accessing websites can often be found and fixed if the network traffic between the browser and the website is captured as the problem occurs. This short post explains how to capture such logs.

Capturing Network Traffic Logs

If someone asked you to read this post, chances are good that you were asked to capture a web traffic log to track down a bug in a website or your web browser.

Fortunately, in Google Chrome or the new Microsoft Edge (version 76+), capturing traffic is simple:

  1. Optional but helpful: Close all browser tabs but one.
  2. Navigate the tab to chrome://net-export
  3. In the UI that appears, press the Start Logging to Disk button.
  4. Choose a filename to save the traffic to. Tip: Pick a location you can easily find later, like your Desktop.
  5. Reproduce the networking problem in a new tab. If you close or navigate the //net-export tab, the logging will stop automatically.
  6. After reproducing the problem, press the Stop Logging button.
  7. Share the Net-Export-Log.json file with whomever will be looking at it. Optional: If the resulting file is very large, you can compress it to a ZIP file.
Network Capture UI

Privacy-Impacting Options

In some cases, especially when you dealing with a problem in logging into a website, you may need to set either the Include cookies and credentials or Include raw bytes options before you click the Start Logging button.

Note that there are important security & privacy implications to selecting these options– if you do so, your capture file will almost certainly contain private data that would allow a bad actor to steal your accounts or perform other malicious actions. Share the capture only with a person you trust and do not post it on the Internet in a public forum.

Tutorial Video

If you’re more of a visual learner, here’s a short video demonstrating the traffic capture process.

In a future post, I’ll explore how developers can use the NetLog Viewer to analyze captured traffic.

-Eric

Appendix A: Capture on Startup

In rare cases, you may need to capture network data early (e.g. to capture proxy script downloads and the like. To do that, close Edge, then run

msedge.exe --log-net-log=C:\some_path\some_file_name.json --net-log-capture-mode=IncludeSocketBytes

Note: This approach also works for Electron JS applications like Microsoft Teams:

%LOCALAPPDATA%\Microsoft\Teams\current\Teams.exe --log-net-log=C:\temp\TeamsNetLog.json

I suspect that this is only going to capture the network traffic from the Chromium layer of Electron apps (e.g. web requests from the nodeJS side will not be captured) but it still may be very useful.

Appendix B: References

For a small number of users of Chromium-based browsers (including Chrome and the new Microsoft Edge) on Windows 10, after updating to 78.0.3875.0, every new tab crashes immediately when the browser starts.

Impacted users can open as many new tabs as they like, but each will instantly crash:

EveryTabCrashes

EdgeHavingAProblem

As of Chrome 81.0.3992, the page will show the string Error Code: STATUS_INVALID_IMAGE_HASH.

What’s going wrong?

This problem relates to a security/reliability improvement made to Chromium’s sandboxing. Chromium runs each of the tabs (and extensions) within locked down (“sandboxed”) processes:

JAIL

In Chrome 78, a change was made to prevent 3rd-party code from injecting itself into these sandboxed processes. 3rd-party code is a top source of browser reliability and performance problems, and it has been a longstanding goal for browser vendors to get this code out of the web platform engine.

This new feature relies on setting a Windows 10 Process Mitigation policy that instructs the OS loader to refuse to load binaries that aren’t signed by Microsoft. Edge 13 enabled this mitigation in 2015, and the Chromium change brings parity to the new Edge 78+. Notably, Chrome’s own DLLs aren’t signed by Microsoft so they are specially exempted by the Chromium sandboxing code.

Unfortunately, the impact of this change is that the renderer is killed (resulting in the “Aw snap” page) if any disallowed DLL attempts to load, for instance, if your antivirus software attempts to inject its DLLs into the renderer processes. For example, Symantec Endpoint Protection versions before 14.2 are known to trigger this problem.

If you encounter this problem, you should follow the following steps:

Update any security software you have to the latest version.

Other than malware, security software is the other likely cause of code being unexpectedly injected into the renderers.

Temporarily disable the new protection

You can temporarily launch the browser without this sandbox feature to verify that it’s the source of the crashes.

  1. Close all browser instances (verify that there are no hidden chrome.exe or msedge.exe processes using Task Manager)
  2. Use Windows+R to launch the browser with the command line override:
msedge.exe --disable-features=RendererCodeIntegrity

or

chrome.exe --disable-features=RendererCodeIntegrity

Ensure that the tab processes work properly when code integrity checks are disabled.

If so, you’ve proven that code integrity checks are causing the crashes.

Hunt down the culprit

Visit chrome://conflicts#R to show the list of modules loaded by the client. Look for any files that are not Signed By Microsoft or Google.

If you see any, they are suspects. (There will likely be a few listed as “Shell Extension”s; e.g. 7-Zip.dll, that do not cause this problem)– check for an R in the Process types column to find modules loading in the Renderers.

You should install any available updates for any of your suspects to see if doing so fixes the problem.

Check the Event Log

The Windows Event Log will contain information about modules denied loading. Open Event Viewer. Expand Applications and Services Logs > Microsoft > Windows > CodeIntegrity > Operational and look for events with ID 3033. The detail information will indicate the name and location of the DLL that caused the crash:CodeIntegrity

Optional: Use Enterprise Policy to disable the new protection

If needed, IT Adminstrators can disable the new protection using the RendererCodeIntegrity policy for Chrome and Edge. You should outreach to the software vendors responsible for the problematic applications and request that they update them.

Other possible causes

Note that it’s possible that you could have a PC that encounters symptoms like this (all subprocesses crash) but not a result of the new code integrity check. In such cases, the Error Code on the crash page will be something other than STATUS_INVALID_IMAGE_HASH.

  • For instance, Chromium once had an obscure bug in its sandboxing code that caused all sandboxes to crash depending on the random memory mapping of Address Space Layout Randomization.
  • Similarly, Chrome and Edge still have an active bug where all renderers crash on startup if the PC has AppLocker enabled and the browser is launched elevated (as Administrator).

-Eric

In yesterday’s episode, I shared the root cause of a bug that can cause document.cookie to incorrectly return an empty string if the cookie is over 1kb and the cookie grows in the middle of a DOM document.cookie getter operation.

Unfortunately, that simple bug wasn’t the root cause of the compatibility problem that I was investigating when my code-review uncovered it. The observed compatibility bug was slightly different– in the repro case, only one of the document’s cookies goes missing, and it goes missing even when only one page is setting the cookie.

After the brain-melting exercise of annotating the site’s minified framework libraries (console.log(‘…’) ftw!) via Fiddler’s AutoResponder, I found that the site uses the document.cookie API to save the same cookie (named “ld“) three times in a row, adding some information to the cookie each time. However, the ld cookie mysteriously disappears between 0.4 and 6 milliseconds after it gets set the third time. I painstakingly verified that the cookie wasn’t getting manipulated from any other context when it disappeared.

Hmm…

As I wrote up the investigation notes, I idly noted that due to a trivial typo in the website’s source code, the ld cookie was set first as a Persistent cookie, then (accidentally) as a Session cookie, then as a Persistent cookie.

In re-reading the notes an hour later, again my memory got tickled. Hadn’t I seen something like this before?

Indeed, I had. Just about five years ago, a user reported a similar bug where a HTTP response contained two Set-Cookie calls for the same cookie name and Internet Explorer didn’t store either cookie. I built a reduced test case and reported it to the engineering team.

Pushing Cookies

The root cause of the cookie disappearance relates to the Internet Explorer and Edge “loosely-coupled architecture.”

In IE and Edge, each browser tab process runs its own networking stack, in-process1. For persistent cookies, this poses no problem, because every browser process hits the same WinINET cookie storage area and gets back the latest value of the persistent cookie. In contrast, for session cookies, there’s a challenge. Session cookies are stored in local (per-process) variables in the networking code, but a browser session may include multiple tab processes. A Session cookie set in a tab process needs to be available in all other tab processes in that browser session.

As a consequence, when a tab writes a Session cookie, Edge must send an interprocess communication (IPC) message to every other process in the browser session, telling each to update its internal variables with the new value of the Session cookie. This Cookie Pushing IPC is asynchronous, and if the named cookie were later modified in a process before the IPC announcing the earlier update to the cookie is received, that later update is obliterated.

The Duplicate Set-Cookie header version of this bug got fixed in the Fall 2017 Update (RS3) to Windows 10 and thus my old Set-Cookie test case case no longer reproduces the problem.

Unfortunately, it turns out that the RS3 fix only corrected the behavior of the network stack when it encounters this pattern– if the cookie-setting calls are made via document.cookie, the problem reappears, as in this document.cookie test case.

BadBehavior

Playing with the repro page, you’ll notice that manually pushing “Set HOT as a Session cookie” or “Set as a Persistent cookie” works fine, because your puny human reflexes aren’t faster than the cookie-pushing IPC. But when you push the “Set twice” button that sets the cookie twice in fast succession, the HOT cookie disappears in Edge (and in IE11, if you have more than one tab open).

Until this bug is fixed, avoid using document.cookie to change a persistent cookie to a session cookie.

-Eric

In contrast, in Chrome, all networking occurs in the browser process (or a networking-only process), and if a tab process wants to get the current document.cookie, it must perform an IPC to ask the browser process for the cookie value. We call this “cookie pulling.”

I recently bought a Dell XPS 8900 desktop system with Windows 10. It ran okay for a while, but after enabling Hyper-V, every few minutes the system would freeze for a few seconds and then reboot with no explanation. Looking at the Event Viewer’s Windows Logs > System revealed that the system had bugchecked (blue screened):

Event Viewer - BugcheckBugcheck 0x1a indicates a problem with “Memory Management” .

Run WinDbg as Administrator. File > Open Crash Dump:

WinDBG open crash dump 

Open C:\Windows\memory.dmp. Wait for symbols to download:

Debuggee not connected; symbols downloading

If symbols aren’t downloaded automatically, try typing .symfix and then .reload in the command prompt at the bottom.

Use !analyze -v says WinDBG

Then, follow the tool’s advice and run !analyze -v to have the debugger analyze the crash. WinDBG presents a surprisingly readable explanation:

WinDBG notes driver memory corruption

So a driver’s at fault, but which one?

Stack trace points at WiFi

It looks like bcmwl63a, for which symbols aren’t loaded, one clue that this isn’t Microsoft’s code. Let’s find out more about it using lm vm bcmw163a:

 

Debugger points at Wifi driver

Pop over to the listed path to examine the file’s properties, and see that it’s the WiFi driver:

Driver details

The Dell 1560 802.11ac card is the same type as found in my Dell XPS 13” notebook PC, where it was responsible for a flurry of bluescreens last year. The driver appears to have improved (the XPS 13 doesn’t crash anymore), but it looks like some corner cases got missed, likely related to the Hyper-V virtual networking code. Rather than waiting for an updated driver, the experts on Twitter suggested I simply upgrade to the Intel 7265 and install the latest Intel PROSet wireless driver. At $20 on Amazon, this seemed like a fine approach.

The upgrade was straightforward and would’ve taken less than 5 minutes to install except one of the nearly microscopic sockets broke off as I removed the Dell card’s antenna cables:

BrokenSocket

I used a needle to remove the broken pieces from the antenna’s connector before it would fit on the new card’s socket. After connecting the antenna, the new card easily slid into the slot and Windows recognized it on next boot. I used Device Manager to ensure the drivers loaded for the new card’s Bluetooth support, and installed the latest PROSet driver. Everything’s been working great since.

While WinDBG is one of the more inscrutable tools I use, it worked great in this situation and would point even a novice in the right direction.

 

-Eric