debugging

Previously, I’ve described how to capture a network traffic log from Microsoft Edge, Google Chrome, and applications based on Chromium or Electron.

In this post, I aim to catalog some guidance for looking at these logs to help find the root cause of captured problems and otherwise make sense of the data collected.

Last Update: April 24, 2020I expect to update this post over time as I continue to gain experience in analyzing network logs.

Choose A Viewer – Fiddler or Catapult

After you’ve collected the net-export-log.json file using the about:net-export page in the browser, you’ll need to decide how to analyze it.

The NetLog file format consists of a JSON-encoded stream of event objects that are logged as interesting things happen in the network layer. At the start of the file there are dictionaries mapping integer IDs to symbolic constants, followed by event objects that make use of those IDs. As a consequence, it’s very rare that a human will be able to read anything interesting from a NetLog.json file using just a plaintext editor or even a JSON parser.

The most common (by far) approach to reading NetLogs is to use the Catapult NetLog Viewer, a HTML/JavaScript application which loads the JSON file and parses it into a much more readable set of events.

An alternative approach is to use the NetLog Importer for Telerik Fiddler.

Importing NetLogs to Fiddler

For Windows users who are familiar with Fiddler, the NetLog Importer extension for Fiddler is easy-to-use and it enables you to quickly visualize HTTP/HTTPS requests and responses. The steps are easy:

  1. Install the NetLog Importer,
  2. Open Fiddler, ideally in Viewer mode fiddler.exe -viewer
  3. Click File > Import > NetLog JSON
  4. Select the JSON file to import

In seconds, all of the HTTP/HTTPS traffic found in the capture will be presented for your review. If the log was compressed before it was sent to you, the importer will automatically extract the first JSON file from a chosen .ZIP or .GZ file, saving you a step.

In addition to the requests and responses parsed from the log, there are a number of pseudo-Sessions with a fake host of NETLOG that represent metadata extracted from the log:

These pseudo-sessions include:

  • RAW_JSON contains the raw constants and event data. You probably will never want to examine this view.
  • CAPTURE_INFO contains basic data about the date/time of the capture, what browser and OS version were used, and the command line arguments to the browser.
  • ENABLED_EXTENSIONS contains the list of extensions that are enabled in this browser instance. This entry will be missing if the log was captured using the –log-net-log command line argument.
  • URL_REQUESTS contains a dictionary mapping every event related to URL_REQUEST back to the URL Requests to which it belongs. This provides a different view of the events that were used in the parsing of the Web Sessions added to the traffic list.
  • SECURE_SOCKETS contains a list of all of the HTTPS sockets that were established for network requests, including the certificates sent by the server and the parameters requested of any client certificates. The Server certificates can be viewed by saving the contents of a –BEGIN CERTIFICATE– entry to a file named something.cer. Alternatively, select the line, hit CTRL+C, click Edit > Paste As Sessions, select the Certificates Inspector and press its Content Certificates button.

You can then use Fiddler’s UI to examine each of the Web Sessions.

Limitations

The NetLog format currently does not store request body bytes, so those will always be missing (e.g. on POST requests).

Unless the Include Raw Bytes option was selected by the user collecting the capture, all of the response bytes will be missing as well. Fiddler will show a “dropped” notice when the body bytes are missing:

If the user did not select the Include Cookies and Credentials option, any Cookie or Authorization headers will be stripped down to help protect private data:

Scenario: Finding URLs

You can use Fiddler’s full text search feature to look for URLs of interest if the traffic capture includes raw bytes. Otherwise, you can search the Request URLs and headers alone.

On any session, you can use Fiddler’s “P” keystroke (or the Select > Parent Request context menu command) to attempt to walk back to the request’s creator (e.g. referring HTML page).

You can look for the traffic_annotation value that reflects why a resource was requested by looking for the X-Netlog-Traffic_Annotation Session Flag.

Scenario: Cookie Issues

If Fiddler sees that cookies were not set or sent due to features like SameSiteByDefault cookies, it will make a note of that in the Session using a psuedo $NETLOG-CookieNotSent or $NETLOG-CookieNotSet header on the request or response:

Closing Notes

If you’re interested in learning more about this extension, see announcement blog post and the open-source code.

While the Fiddler Importer is very convenient for analyzing many types of problems, for others, you need to go deeper and look at the raw events in the log using the Catapult Viewer.


Viewing NetLogs with the Catapult Viewer

Opening NetLogs with the Catapult NetLog Viewer is even simpler:

  1. Navigate to the web viewer
  2. Select the JSON file to view

If you find yourself opening NetLogs routinely, you might consider using a shortcut to launch the Viewer in an “App Mode” browser instance: msedge.exe --app=https://netlog-viewer.appspot.com/#import

The App Mode instance is a standalone window which doesn’t contain tabs or other UI:

Note that the Catapult Viewer is a standalone HTML application. If you like, you can save it as a .HTML file on your local computer and use it even when completely disconnected from the Internet. The only advantage to loading it from appspot.com is that the version hosted there is updated from time to time.

Along the left side of the window are tabs that offer different views of the data– most of the action takes place on the Events tab.

Tips

If the problem only exists on one browser instance, check the Command Line parameters and Active Field trials sections on the Import tab to see if there’s an experimental flag that may have caused the breakage. Similarly, check the Modules tab to see if there are any browser extensions that might explain the problem.

Each URL Request has a traffic_annotation value which is a hash you can look up in annotations.xml. That annotation will help you find what part of Chromium generated the network request:

Most requests generated by web content will be generated by the blink_resource_loader, navigations will have navigation_url_loader and requests from features running in the browser process are likely to have other sources.

Scenario: DNS Issues

Look at the DNS tab on the left, and HOST_RESOLVER_IMPL_JOB entries in the Events tab.

One interesting fact: The DNS Error page performs an asynchronous probe to see whether the configured DNS provider is working generally. The Error page also has automatic retry logic; you’ll see a duplicate URL_REQUEST sent shortly after the failed one, with the VALIDATE_CACHE load flag added to it. In this way, you might see a DNS_PROBE_FINISHED_NXDOMAIN error magically disappear if the user’s router’s DNS flakes.

Scenario: Cookie Issues

Look for COOKIE_INCLUSION_STATUS events for details about each candidate cookie that was considered for sending or setting on a given URL Request. In particular, watch for cookies that were excluded due to SameSite or similar problems.

Scenario: HTTPS Handshaking Issues

Look for SSL_CONNECT_JOB and CERT_VERIFIER_JOB entries. Look at the raw TLS messages on the SOCKET entries.

Scenario: HTTPS Certificate Issues

Note: While NetLogs are great for capturing certs, you can also get the site’s certificate from the browser’s certificate error page.

The NetLog includes the certificates used for each HTTPS connection in the base64-encoded SSL_CERTIFICATES_RECEIVED events.

You can just copy paste each certificate (including ---BEGIN to END---) out to a text file, name it log.cer, and use the OS certificate viewer to view it. Or you can use Fiddler’s Inspector, as noted above.

If you’ve got Chromium’s repo, you can instead use the script at \src\net\tools\print_certificates.py to decode the certificates. There’s also a cert_verify_tool in the Chromium source you might build and try. For Mac, using verify-cert to check the cert and dump-trust-settings to check the state of the Root Trust Store might be useful.

In some cases, running the certificate through an analyzer like https://crt.sh/lintcert can flag relevant problems.

Scenario: Authentication Issues

Look for HTTP_AUTH_CONTROLLER events, and responses with the status codes 401, 403, and 407.

For instance, you might find that the authentication fails with ERR_INVALID_AUTH_CREDENTIALS unless you enable the browser’s DisableAuthNegotiateCnameLookup policy (Kerberos has long been very tricky).

Scenario: Debugging Proxy Configuration Issues

See Debugging the behavior of Proxy Configuration Scripts.


Got a great NetLog debugging tip I should include here? Please leave a comment and teach me!

-Eric

I’ve written about Browser Proxy Configuration a few times over the years, and I’m delighted that Chromium has accurate & up-to-date documentation for its proxy support.

One thing I’d like to call out is that Microsoft Edge’s new Chromium foundation introduces a convenient new debugging feature for debugging the behavior of Proxy AutoConfiguration (PAC) scripts.

To use it, simply add alert() calls to your PAC script, like so:

alert("!!!!!!!!! PAC script start parse !!!!!!!!");
function FindProxyForURL(url, host) {
alert("Got request for (" + url+ " with host: " + host + ")");
return "PROXY 127.0.0.1:8888";
}
alert("!!!!!!!!! PAC script done parse !!!!!!!!");

Then, collect a NetLog trace from the browser:

msedge.exe --log-net-log=C:\temp\logFull.json --net-log-capture-mode=IncludeSocketBytes

…and reproduce the problem.

Save the NetLog JSON file and reload it into the NetLog viewer. Search in the Events tab for PAC_JAVASCRIPT_ALERT events:

Even without adding new alert() calls, you can also look for HTTP_STREAM_JOB_CONTROLLER_PROXY_SERVER_RESOLVED events to see what proxy the proxy resolution process determined should be used.

One current limitation of the current logging is that if the V8 Proxy Resolver process…

… crashes (e.g. because Citrix injected a DLL into it), there’s no mention of that crash in the NetLog; it will just show DIRECT. Until the logging is enhanced, users can hit SHIFT+ESC to launch the browser’s task manager and check to see whether the utility process is alive.

Try using the System Resolver

In some cases (e.g. when using DirectAccess), you might want to try using Windows’ proxy resolution code rather than the code within Chromium.

The --winhttp-proxy-resolver command line argument will direct Chrome/Edge to call out to Windows’ WinHTTP Proxy Service for PAC processing.

Differences in WPAD/PAC Processing

  • The WinHTTP Proxy Service caches proxy authentication credentials and reuses them across browser launches; Chromium does not.
  • The WinHTTP Proxy Service caches WPAD determination across process launches. Chromium does not and will need to redetect the proxy each time the browser reopens.
  • Internet Explorer/WinINET/Edge Legacy call the PAC script’s FindProxyForURLEx function (introduced to unlock IPv6 support), if present, and FindProxyForURL if not.
  • Chrome/Edge/Firefox only call the FindProxyForURL function and do not call the Ex version.
  • Internet Explorer/WinINET/Edge Legacy expose a getClientVersion API that is not defined in other PAC environments.
  • Chrome/Edge may return different results than IE/WinINET/EdgeLegacy from the myIpAddress function when connected to a VPN.

Notes for Other Browsers

  • Prior to Windows 8, IE showed PAC alert() notices in a modal dialog box. It no longer does so and alert() is a no-op.
  • Firefox shows alert() messages in the Browser Console (hit Ctrl+Shift+J); note that Firefox’s Browser Console is not the Web Console where web pages’ console.log statements are shown.

-Eric

Problems in accessing websites can often be found and fixed if the network traffic between the browser and the website is captured as the problem occurs. This short post explains how to capture such logs.

Capturing Network Traffic Logs

If someone asked you to read this post, chances are good that you were asked to capture a web traffic log to track down a bug in a website or your web browser.

Fortunately, in Google Chrome or the new Microsoft Edge (version 76+), capturing traffic is simple:

  1. Optional but helpful: Close all browser tabs but one.
  2. Navigate the tab to chrome://net-export
  3. In the UI that appears, press the Start Logging to Disk button.
  4. Choose a filename to save the traffic to. Tip: Pick a location you can easily find later, like your Desktop.
  5. Reproduce the networking problem in a new tab. If you close or navigate the //net-export tab, the logging will stop automatically.
  6. After reproducing the problem, press the Stop Logging button.
  7. Share the Net-Export-Log.json file with whomever will be looking at it. Optional: If the resulting file is very large, you can compress it to a ZIP file.
Network Capture UI

Privacy-Impacting Options

In some cases, especially when you dealing with a problem in logging into a website, you may need to set either the Include cookies and credentials or Include raw bytes options before you click the Start Logging button.

Note that there are important security & privacy implications to selecting these options– if you do so, your capture file will almost certainly contain private data that would allow a bad actor to steal your accounts or perform other malicious actions. Share the capture only with a person you trust and do not post it on the Internet in a public forum.

Tutorial Video

If you’re more of a visual learner, here’s a short video demonstrating the traffic capture process.

In a followup post, I explore how developers can analyze captured traffic.

-Eric

Appendix A: Capture on Startup

In rare cases, you may need to capture network data early (e.g. to capture proxy script downloads and the like. To do that, close Edge, then run

msedge.exe --log-net-log=C:\some_path\some_file_name.json --net-log-capture-mode=IncludeSocketBytes

Note: This approach also works for Electron JS applications like Microsoft Teams:

%LOCALAPPDATA%\Microsoft\Teams\current\Teams.exe --log-net-log=C:\temp\TeamsNetLog.json

I suspect that this is only going to capture the network traffic from the Chromium layer of Electron apps (e.g. web requests from the nodeJS side will not be captured) but it still may be very useful.

Appendix B: References

For a small number of users of Chromium-based browsers (including Chrome and the new Microsoft Edge) on Windows 10, after updating to 78.0.3875.0, every new tab crashes immediately when the browser starts.

Impacted users can open as many new tabs as they like, but each will instantly crash:

EveryTabCrashes

EdgeHavingAProblem

As of Chrome 81.0.3992, the page will show the string Error Code: STATUS_INVALID_IMAGE_HASH.

What’s going wrong?

This problem relates to a security/reliability improvement made to Chromium’s sandboxing. Chromium runs each of the tabs (and extensions) within locked down (“sandboxed”) processes:

JAIL

In Chrome 78, a change was made to prevent 3rd-party code from injecting itself into these sandboxed processes. 3rd-party code is a top source of browser reliability and performance problems, and it has been a longstanding goal for browser vendors to get this code out of the web platform engine.

This new feature relies on setting a Windows 10 Process Mitigation policy that instructs the OS loader to refuse to load binaries that aren’t signed by Microsoft. Edge 13 enabled this mitigation in 2015, and the Chromium change brings parity to the new Edge 78+. Notably, Chrome’s own DLLs aren’t signed by Microsoft so they are specially exempted by the Chromium sandboxing code.

Unfortunately, the impact of this change is that the renderer is killed (resulting in the “Aw snap” page) if any disallowed DLL attempts to load, for instance, if your antivirus software attempts to inject its DLLs into the renderer processes. For example, Symantec Endpoint Protection versions before 14.2 are known to trigger this problem.

If you encounter this problem, you should follow the following steps:

Update any security software you have to the latest version.

Other than malware, security software is the other likely cause of code being unexpectedly injected into the renderers.

Temporarily disable the new protection

You can temporarily launch the browser without this sandbox feature to verify that it’s the source of the crashes.

  1. Close all browser instances (verify that there are no hidden chrome.exe or msedge.exe processes using Task Manager)
  2. Use Windows+R to launch the browser with the command line override:
  msedge.exe --disable-features=RendererCodeIntegrity

or

  chrome.exe --disable-features=RendererCodeIntegrity

Ensure that the tab processes work properly when code integrity checks are disabled.

If so, you’ve proven that code integrity checks are causing the crashes.

Hunt down the culprit

Navigate your browser to the URL chrome://conflicts#R to show the list of modules loaded by the client. Look for any files that are not Signed By Microsoft or Google.

If you see any, they are suspects. (There will likely be a few listed as Shell Extensions; e.g. 7-Zip.dll, that do not cause this problem)– check for an R in the Process types column to find modules loading in the Renderers.

You should install any available updates for any of your suspects to see if doing so fixes the problem.

Check the Event Log

The Windows Event Log will contain information about modules denied loading. Open Event Viewer. Expand Applications and Services Logs > Microsoft > Windows > CodeIntegrity > Operational and look for events with ID 3033. The detail information will indicate the name and location of the DLL that caused the crash:CodeIntegrity

Optional: Use Enterprise Policy to disable the new protection

If needed, IT Adminstrators can disable the new protection using the RendererCodeIntegrity policy for Chrome and Edge. You should outreach to the software vendors responsible for the problematic applications and request that they update them.

Other possible causes

Note that it’s possible that you could have a PC that encounters symptoms like this (all subprocesses crash) but not a result of the new code integrity check. In such cases, the Error Code on the crash page will be something other than STATUS_INVALID_IMAGE_HASH.

  • For instance, Chromium once had an obscure bug in its sandboxing code that caused all sandboxes to crash depending on the random memory mapping of Address Space Layout Randomization.
  • Similarly, Chrome and Edge still have an active bug where all renderers crash on startup if the PC has AppLocker enabled and the browser is launched elevated (as Administrator).

-Eric

In yesterday’s episode, I shared the root cause of a bug that can cause document.cookie to incorrectly return an empty string if the cookie is over 1kb and the cookie grows in the middle of a DOM document.cookie getter operation.

Unfortunately, that simple bug wasn’t the root cause of the compatibility problem that I was investigating when my code-review uncovered it. The observed compatibility bug was slightly different– in the repro case, only one of the document’s cookies goes missing, and it goes missing even when only one page is setting the cookie.

After the brain-melting exercise of annotating the site’s minified framework libraries (console.log(‘…’) ftw!) via Fiddler’s AutoResponder, I found that the site uses the document.cookie API to save the same cookie (named “ld“) three times in a row, adding some information to the cookie each time. However, the ld cookie mysteriously disappears between 0.4 and 6 milliseconds after it gets set the third time. I painstakingly verified that the cookie wasn’t getting manipulated from any other context when it disappeared.

Hmm…

As I wrote up the investigation notes, I idly noted that due to a trivial typo in the website’s source code, the ld cookie was set first as a Persistent cookie, then (accidentally) as a Session cookie, then as a Persistent cookie.

In re-reading the notes an hour later, again my memory got tickled. Hadn’t I seen something like this before?

Indeed, I had. Just about five years ago, a user reported a similar bug where a HTTP response contained two Set-Cookie calls for the same cookie name and Internet Explorer didn’t store either cookie. I built a reduced test case and reported it to the engineering team.

Pushing Cookies

The root cause of the cookie disappearance relates to the Internet Explorer and Edge “loosely-coupled architecture.”

In IE and Edge, each browser tab process runs its own networking stack, in-process1. For persistent cookies, this poses no problem, because every browser process hits the same WinINET cookie storage area and gets back the latest value of the persistent cookie. In contrast, for session cookies, there’s a challenge. Session cookies are stored in local (per-process) variables in the networking code, but a browser session may include multiple tab processes. A Session cookie set in a tab process needs to be available in all other tab processes in that browser session.

As a consequence, when a tab writes a Session cookie, Edge must send an interprocess communication (IPC) message to every other process in the browser session, telling each to update its internal variables with the new value of the Session cookie. This Cookie Pushing IPC is asynchronous, and if the named cookie were later modified in a process before the IPC announcing the earlier update to the cookie is received, that later update is obliterated.

The Duplicate Set-Cookie header version of this bug got fixed in the Fall 2017 Update (RS3) to Windows 10 and thus my old Set-Cookie test case case no longer reproduces the problem.

Unfortunately, it turns out that the RS3 fix only corrected the behavior of the network stack when it encounters this pattern– if the cookie-setting calls are made via document.cookie, the problem reappears, as in this document.cookie test case.

BadBehavior

Playing with the repro page, you’ll notice that manually pushing “Set HOT as a Session cookie” or “Set as a Persistent cookie” works fine, because your puny human reflexes aren’t faster than the cookie-pushing IPC. But when you push the “Set twice” button that sets the cookie twice in fast succession, the HOT cookie disappears in Edge (and in IE11, if you have more than one tab open).

Until this bug is fixed, avoid using document.cookie to change a persistent cookie to a session cookie.

-Eric

In contrast, in Chrome, all networking occurs in the browser process (or a networking-only process), and if a tab process wants to get the current document.cookie, it must perform an IPC to ask the browser process for the cookie value. We call this “cookie pulling.”

I recently bought a Dell XPS 8900 desktop system with Windows 10. It ran okay for a while, but after enabling Hyper-V, every few minutes the system would freeze for a few seconds and then reboot with no explanation. Looking at the Event Viewer’s Windows Logs > System revealed that the system had bugchecked (blue screened):

Event Viewer - BugcheckBugcheck 0x1a indicates a problem with “Memory Management” .

Run WinDbg as Administrator. File > Open Crash Dump:

WinDBG open crash dump 

Open C:\Windows\memory.dmp. Wait for symbols to download:

Debuggee not connected; symbols downloading

If symbols aren’t downloaded automatically, try typing .symfix and then .reload in the command prompt at the bottom.

Use !analyze -v says WinDBG

Then, follow the tool’s advice and run !analyze -v to have the debugger analyze the crash. WinDBG presents a surprisingly readable explanation:

WinDBG notes driver memory corruption

So a driver’s at fault, but which one?

Stack trace points at WiFi

It looks like bcmwl63a, for which symbols aren’t loaded, one clue that this isn’t Microsoft’s code. Let’s find out more about it using lm vm bcmw163a:

 

Debugger points at Wifi driver

Pop over to the listed path to examine the file’s properties, and see that it’s the WiFi driver:

Driver details

The Dell 1560 802.11ac card is the same type as found in my Dell XPS 13” notebook PC, where it was responsible for a flurry of bluescreens last year. The driver appears to have improved (the XPS 13 doesn’t crash anymore), but it looks like some corner cases got missed, likely related to the Hyper-V virtual networking code. Rather than waiting for an updated driver, the experts on Twitter suggested I simply upgrade to the Intel 7265 and install the latest Intel PROSet wireless driver. At $20 on Amazon, this seemed like a fine approach.

The upgrade was straightforward and would’ve taken less than 5 minutes to install except one of the nearly microscopic sockets broke off as I removed the Dell card’s antenna cables:

BrokenSocket

I used a needle to remove the broken pieces from the antenna’s connector before it would fit on the new card’s socket. After connecting the antenna, the new card easily slid into the slot and Windows recognized it on next boot. I used Device Manager to ensure the drivers loaded for the new card’s Bluetooth support, and installed the latest PROSet driver. Everything’s been working great since.

While WinDBG is one of the more inscrutable tools I use, it worked great in this situation and would point even a novice in the right direction.

 

-Eric