Leaky Abstractions

In the late 1990s, the Windows Shell and Internet Explorer teams introduced a bunch of brilliant and intricate designs that allowed extension of the shell and the browser to handle scenarios beyond what those built by Microsoft itself. For instance, Internet Explorer supported the notion of pluggable protocols (“What if some protocol, say, FTPS, becomes as important as HTTP?”) and the Windows Shell offered an extremely flexible set of abstractions for browsing of namespaces, enabling third parties to build browsable “folders” not backed by the file system– everything from WebDAV (“your HTTP-server is a folder“) to CAB Folders (“your CAB archive is a folder“). As a PM on the clipart team in 2004, after I built a .NET-based application to browse clipart from the Office web services, I next sketched out an initial design for a Windows Shell extension that would make it look like Microsoft’s enormous web-based clipart archive were installed in a local folder on your system.

Perhaps the most popular (or infamous) example of a shell namespace extension is the Compressed Folders extension, which handles the exploration of ZIP files. First introduced in the Windows 98 Plus Pack and later included with Windows Me+ directly, Compressed Folders allows billions of Windows users to interact with ZIP files without downloading third-party software. Perhaps surprisingly, the feature was itself was acquired from two third-parties — Microsoft acquired the Explorer integration from Dave Plummer’s “side project”, while a company called InnerMedia claims credit for the “DynaZIP” engine underneath.

Unfortunately, the code hasn’t really been updated in a while. A long while. The timestamp in the module claims it was last updated on Valentine’s Day 1998, and while I suspect there may’ve been a fix here or there since then (and one feature, extract-only Unicode filename support), it’s no secret that the code is, as Raymond Chen says: “stuck at the turn of the century.” That means that it doesn’t support “modern” features like AES encryption, and its performance (runtime, compression ratio) is known to be dramatically inferior to modern 3rd-party implementations.

So, why hasn’t it been updated? Well, “if it aint broke, don’t fix it” accounts for part of the thinking– the ZIP Folders implementation has survived in Windows for 23 years without the howling of customers becoming unbearable, so there’s some evidence that users are happy enough.

Unfortunately, there are degenerate cases where the ZIP Folders support really is broken. I ran across one of those yesterday. I had seen an interesting Twitter thread about hex editors that offer annotation (useful for exploring file formats) and decided to try a few out (I decided I like ReHex best). But in the process, I downloaded the portable version of ImHex and tried to move it to my Tools folder.

I did so by double-clicking the 11.5mb ZIP to open it. I then hit CTRL+A to select all of the files within, then crucially (spoiler alert) CTRL+X to cut the files to my clipboard.

I then created a new subfolder in my C:\Tools folder and hit CTRL+V to paste. And here’s where everything went off the rails– Windows spent well over a minute showing “Calculating…” with no visible progress beyond the creation of a single subfolder with a single 5k file within:

Huh? I knew that the ZIP engine beneath ZIP Folders wasn’t well-optimized, but I’d never seen anything this bad before. After waiting a few more minutes, another file extracted, this one 6.5 mb:

This is bananas. I opened Task Manager, but nothing seemed to be using up much of my 12 thread CPU, my 64gb of memory, or my NVMe SSD. Finally, I opened up SysInternals’ Process Monitor to try to see what was going on, and the root cause of the problem was quickly seen.

After some small reads from the end of the file (where the ZIP file keeps its index), the entire 11 million byte file was being read from disk a single byte at a time:

Looking more closely, I realized that the reads were almost all a single byte, but every now and then, after a specific 1 byte read, a 15 byte read was issued:

What’s at those interesting offsets (330, 337)? The byte 0x50, aka the letter P.

Having written some trivial ZIP-recovery code in the past, I know what’s special about the character P in ZIP files– it’s the first byte of the ZIP format’s block markers, each of which start with 0x50 0x4B. So what’s plainly happening here is that the code is reading the file from start to finish looking for a particular block, 16 bytes in size. Each time it hits a P, it looks at the next 15 bytes to see if they match the desired signature, and if not, it continues scanning byte-by-byte, looking for the next P.

Is there something special about this particular ZIP file? Yes.

The ZIP Format consists of a series of file records, followed by a list (“Central Directory”) of those file records.

Each file record has its own “local file header” which contains information about the file, including its size, compressed size, and CRC-32; the same metadata is repeated in the Central Directory.

However, the ZIP format allows the local file headers to omit this metadata and instead write it as a “trailer” after each individual file’s DEFLATE-compressed data, a capability that is useful when streaming compression– you cannot know the final compressed size for each file until you’ve actually finished compressing its data. Most ZIP files probably don’t make use of this option, but my example download does. (The developer reports that this ZIP file was created by the GitHub CI.)

You can see the CRC and sizes are 0‘d in the header and instead appear immediately following the signature 0x08074b50 (Data Descriptor), just before the next file’s local header:

The 0x08 bit in the General Purpose flag indicates this option; users of 7-Zip can find it mentioned as Descriptor in the entry’s Characteristics column:

Based on the read size (1+15 bytes), I assume the code is groveling for the Data Descriptor blocks. Why it does that (vs. just reading the same data from the Central Directory), I do not know.

Making matters worse, this “read the file, byte by byte” crawl through the file doesn’t just happen once– it happens at least once for every file extracted. Making matters worse, this data is being read with ReadFile rather than fread() meaning that there’s no caching in userspace, requiring we go to the kernel for every byte read.

Eventually, after watching about 85 million single byte reads, Process Monitor hangs:

After restarting and configuring Process Monitor with Symbols, we can examine the one-byte reads and get a hint of what’s going on:

The GetSomeBytes function is getting hammered with calls passing a single byte buffer, in a tight loop inside the readzipfile function. But look down the stack and the root cause of the mess becomes clear– this is happening because after each file is “moved” from the ZIP to the target folder, the ZIP file must be updated to remove the file that was “moved.” This deletion process is inherently not fast (because it results in shuffling all of the subsequent bytes of the file and updating the index), and as implemented in the readzipfile function (with its one-byte read buffer) it is atrociously slow.

Back up in my repro steps, note that I hit CTRL+X to “Cut” the files, resulting in a Move operation. Had I instead hit CTRL+C to “Copy” the files, resulting in a Copy operation, the ZIP folder would not have performed a delete operation as each file was extracted. The time required to unpack the ZIP file drops from over thirty minutes to four seconds. For perspective, 7-Zip unpacks the file in under a quarter of a second, although it cheats a little.

And here’s where the abstraction leaks (as all non-trivial abstractions do)– from a user’s point-of-view, copying files out of a ZIP file (then deleting the ZIP) vs. moving the files from a ZIP file seems like it shouldn’t be very different. Unfortunately, the abstraction fails to fully paper over the reality that deleting from certain ZIP files is an extremely slow operation, while deleting a file from a disk is usually trivial. As a consequence, the Compressed Folder abstraction works well for tiny ZIPs, but fails for the larger ZIP files that are becoming increasingly common.

While it’s relatively easy to think of ways to dramatically improve the performance of this scenario, precedent suggests that the code in Windows is unlikely to be improved anytime soon. Perhaps for its 25th Anniversary? 🤞

Update 13-August-2024: Unfortunately, you don’t have to look far to find other places where this abstraction leaks. Users expect to be able to drag/drop files from any Windows Shell view (including a ZIP Folder) into other apps (like Microsoft Paint, or any website that allows uploads). Unfortunately this does not work correctly, because the data placed into the data transfer object when dragging an item from within from a ZIP Folder differs from the data put into the object when dragging a “real” file.

– Eric

Offline NetLog Viewing

A while back, I explained how you can use Telerik Fiddler or the Catapult NetLog Viewer to analyze a network log captured from Microsoft Edge, Google Chrome, or another Chromium or Electron-based application.

While Fiddler is a native app that runs locally, the Catapult NetLog Viewer is a JavaScript application that runs in your browser. Because NetLogs can contain sensitive data, some users have worried about the privacy of the viewer– what if someday it started leaking sensitive data from logs, either unintentionally or maliciously?

Fortunately, the NetLog Viewer is a self-contained single page application that doesn’t need a network connection to run. You can use it entirely offline, either from a Virtual Machine with no network connection, or from a browser instance configured to override all network requests.

Your first step is to get an copy of the viewer as a file. You can do that by right-clicking this link and choosing “Save Link As”. Save the HTML file somewhere locally, e.g. C:\temp\NetLogView.html on Windows.

If you want to run it from a disconnected VM, simply copy the file into such a VM and you’re good to go.

If, however, you want the convenience of running the viewer from your Internet-connected PC without worrying about leaks, you can run it from a browser instance that won’t make network connections.

After saving the file, open it in a new browser window thusly:

msedge.exe --user-data-dir=C:\temp\profile --inprivate --host-rules="MAP * 0.0.0.0" --app=C:\temp\NetLogView.html

The command line arguments bear some explanation. In reverse order:

  • The app argument instructs Edge to open the supplied file with a minimal browser UI, as if it were a native app.
  • The host-rules argument tells the browser instance to direct all network requests to an IP address of 0.0.0.0. On Windows, such requests instantly fail. On Mac/Linux, the null IP points back at your own PC.
  • The inprivate argument directs the browser to discard all storage after the app exits (since it’s not needed). For Chrome, use --incognito instead.
  • The user-data-dir instructs the browser to use a temporary browser profile (which prevents the app’s window from being merged into an existing browser process, such that the host-rules argument would’ve been ignored.)


While none of this is strictly necessary (the NetLog Viewer doesn’t leak data), it’s always nice to be able to discard attack surface wherever possible.

-Eric

Download Blocking by File Type

Last Updated: 20 May 2024

I’ve previously gushed about the magic of the File Type Policies component — a mechanism that allows files to be classified by their level of “dangerousness”, such that harmless files (e.g. .txt files) can be downloaded freely, whilst potentially-dangerous files (e.g. .dll files) are subjected to a higher degree of vetting and a more security-conscious user-experience.

File Type Danger Level

Microsoft Edge inherits its file type policies from the upstream Chromium browser; you can view the current contents of the list here, and documentation of its format here. UPDATE: As of 2024, Edge’s JSON has evolved significantly from upstream Chromium, and it treats many types as more dangerous than Chrome does.

Within the list, you’ll see that each type has a danger_level, which is one of three values: DANGEROUS, NOT_DANGEROUS, or ALLOW_ON_USER_GESTURE.

The first two danger levels are simple: NOT_DANGEROUS means Safe to download and open, even if the download was accidental. No additional warnings are necessary. DANGEROUS means Always1 warn the user that this file may harm their computer. Let users continue or discard the file. If [SmartScreen or Safe Browsing] returns a SAFE verdict, still warn the user before saving the file.

The third setting, ALLOW_ON_USER_GESTURE2 is more subtle. Such files are potentially dangerous, but likely harmless if the user is familiar with download site and if the download was intentional. Microsoft Edge will allow such downloads to proceed automatically if two conditions are both met:

  1. User Gesture: There is a user gesture associated with the network request that initiated the download (e.g. the user clicked a link to the download).
  2. Familiar Initiator: There is a recorded prior visit to the referring origin prior to the most recent midnight (i.e. yesterday or earlier). Such a visit implies that the user has at least some history of visiting the site that kicked off the download.

The download will also proceed automatically if the user explicitly initiated a download by using the Save link as context menu command or entered directly into the browser’s address bar the download’s URL.

SmartScreen/SafeBrowsing Verdict Overrides

Importantly, if Microsoft Defender SmartScreen (in Edge), or Google Safe Browsing (in Chrome), indicates that the file is known safe, that takes precedent over the ALLOW_ON_USER_GESTURE heuristics.

This override allows the user to avoid spurious/scary warnings, for example, when downloading drivers for their graphics card. Without this override, most users would see a warning because the driver installer .exe is served by their GPU vendor’s website (e.g. ati.com) which is somewhat unlikely to be a domain that passes the Familiar Initiator check. Because SmartScreen reports that the signed ATI drivers are non-malicious, it can return a “Safe” verdict and the download will proceed without warning.

I wrote a short blog post about Reputation Services overriding default warnings.

User Experience for Downloads Lacking Gestures

Within Google Chrome, a download lacking a required gesture shows explicit buttons to allow the user to decide whether to proceed with the download or abandon it:

Starting in version 91, Microsoft Edge joined Google Chrome in interrupting downloads that lack the required gesture. However, from Edge 91-94, Microsoft Edge states that the download “was blocked”, although the same options, titled Keep and Delete, are available from the … menu on the download item.

UPDATE: Edge 95+ was updated with an interruption UX more like Chrome’s, in order to better reflect that the user may choose to continue to download the file.

If you visit edge://downloads, you’ll see the same options:

Enterprise Controls

While users are somewhat unlikely to encounter download interruptions for sites they use every day, they might encounter them for legitimate downloads on sites that they use rarely or in sites that hit “Corner Cases” described in a section below.

To help streamline the user-experience for Enterprises, a Group Policy is available.

Enterprises can set a ExemptFileTypeDownloadWarnings policy to specify the filetypes that are allowed to download from specific sites without interruption.

[{"file_extension":"xml","domains":["contoso.com", "woodgrovebank.com"]},
{"file_extension":"msg", "domains": ["*"]}]

If the SmartScreenForTrustedDownloadsEnabled (or equivalent policy for Chrome) is set to 0 (disabled), and the file download’s URL is Trusted (on Windows, in the Local Machine, Intranet, or Trusted zone) then the download will proceed without interruption (even without a gesture), regardless of danger_level. (Aside: This seems a bit strange, but feels more logical if you pretend that the file type warnings are a part of SmartScreen).

File Types Requiring a Gesture

File types policies are published in the Chromium source code. As of May 2021, file types with a danger_level of ALLOW_ON_USER_GESTURE on at least one OS platform include:
accda, accdb, accde, accdr, action, ad, ade, adp, apk, app, application, appref-ms, as, asp, asx, bas, bash, bat, caction, cdr, cer, chi, chm, cmd, com, command, configprofile, cpgz, cpi, cpl, crt, crx, csh, dart, dc42, deb, definition, der, desktop, dex, diskcopy42, dmg, dmgpart, dvdr, dylib, efi, eml, exe, fon, fxp, hlp, htt, img, imgpart, inf, ins, internetconnect, inx, isp, isu, job, js, jse, ksh, lnk, mad, maf, mag, mam, maq, mar, mas, mat, mau, mav, maw, mda, mdb, mde, mdt, mdw, mdz, mht, mhtml, mmc, mobileconfig, mpkg, msc, msg, msh, msh1, msh1xml, msh2, msh2xml, mshxml, msi, msp, mst, ndif, networkconnect, ocx, ops, out, oxt, paf, partial, pax, pcd, pet, pif, pkg, pl, plg, prf, prg, ps1, ps1xml, ps2, ps2xml, psc1, psc2, pst, pup, py, pyc, pyo, pyw, rb, reg, rels, rgs, rpm, run, scr, sct, search-ms, service, settingcontent-ms, sh, shar, shb, shs, slk, slp, smi, sparsebundle, sparseimage, svg, tcsh, toast, u3p, udif, vb, vbe, vbs, vbscript, vdx, vsd, vsdm, vsdx, vsmacros, vss, vssm, vssx, vst, vstm, vstx, vsw, vsx, vtx, wflow, workflow, ws, wsc, wsf, wsh, xip, xml, xnk, xrm-ms, xsd, xsl

Note: Microsoft Edge’s file type behaviors may (and as of March 2023, does) diverge from the list of types in upstream Chromium, for security and compatibility reasons.

Other Fields in the File Type Policies

  • You’ll also note that some file types have an auto_open_hint which controls whether the user may configure that type of file to open automatically when the download completes.
  • File type settings sometimes vary depending on the client OS platform (an .exe is not dangerous on a Mac, while an .applescript is harmless on Windows). The platform attribute of an entry specifies on which OS the danger_level applies.
  • The max_file_size_to_analyze field controls how big of a file (.zip, .rar, etc) the browser will be willing to unpack to scan it for dangerous content.

Group Policies

DownloadRestrictions is a policy that makes a complicated browser behavior even more complicated. When you set DownloadRestrictions to 1, Edge won’t just interrupt the download, it will block it.

Make matters even more complicated, if you enable DownloadRestrictions and Disable SmartScreen:

…then the file download is blocked silently with no notice — the Download UX does not show, and no warning is emitted to the Developer Tools console.

Enterprises can use ExemptDomainFileTypePairsFromFileTypeDownloadWarnings to specify the filetypes that are allowed to download from specific sites without blocking.

Corner Cases

  • If you put referrerpolicy="no-referrer" on your download link (or otherwise suppress referrers), the Familiar Initiator check fails.
  • Prior to v94, if you initiate the download by dynamically creating an <a> element with a download attribute, then click it from JavaScript, the User Gesture check fails.

As of August 2021, Microsoft Outlook Web Access’ email attachment file downloads encounter both of these issues.

Test cases for these conditions can be found here. (Note that you’ll have to have visited webdbg.com yesterday or earlier for the familiarity check to pass).

Surprise: Zones

File download is one of a handful of places where Chromium-based browsers consider Windows security zones.

Beyond the aforementioned impact when the policy SmartScreenForTrustedDownloadsEnabled is set, if you’ve configured a Zone’s setting for Launching applications and unsafe files to Disable using the Windows Internet Control Panel’s Security tab (or the associated Group Policies), Chromium-based browsers will block file downloads from the Zone in question with a terse note: Couldn't download - Blocked.


Update: For version 105, the Chrome team made several significant changes to the file type policies list and behaviors, with the aim of reducing warnings, as seen in this changelist.

-Eric

Appendix: Comparison to other File Type Danger Systems

Microsoft Office maintains its own list of Dangerous File types used in Outlook, Excel, Word, PowerPoint and OneNote.

A Windows Shell API, AssocIsDangerous allows applications to determine whether a given file extension is dangerous according to the system’s registry configuration, which ISVs can extend to describe the danger level of their own file types.


1 DANGEROUS level files are still saved without an explicit warning if the user uses the “Save Link As” command on the browser context menu. Entering the URL via the address bar or command line will still show the warning.

2 ALLOW_ON_USER_GESTURE_AND_FAMILIAR_INITIATOR would be the accurate name for the setting

Per-Site Permissions in Edge

Last year, I wrote about how the new Microsoft Edge browser mostly ignores Security Zones (except in very rare circumstances) to configure security and permissions decisions. Instead, in Chromium per-site permissions are controlled by settings and policies expressed using a simple syntax with limited wildcarding support.

Settings Page’s Site Permissions and Group Policy

Internet Explorer offered around 88 URLAction permissions, but the majority (62) of these settings have no equivalent; for instance, there are a dozen that control various features of ActiveX controls, a technology that does not exist in the new Edge.

Unfortunately, there’s no document mapping the old URLActions to the new equivalents (if any) available within the new Edge. 

When users open chrome://settings/content/siteDetails?site=https://example.com, they’ll find a long list of configuration switches and lists for various permissions. Users rarely use the Settings Page directly, instead making choices using various widgets and toggles in the Page Info dropdown (which appears when you click the lock) or via various prompts or buttons at the right-edge of the address bar/omnibox.

Enterprises can use Group Policy to provision site lists for individual policies that control the browser’s behavior. To find these policies, simply open the Edge Group Policy documentation and search for ForUrls to find the policies that allow and block behavior based on the loaded site’s URL. I recently wrote a post about Chromium’s URL Filter syntax, which doesn’t always work like one might expect. Most of the relevant settings are listed within the Group Policy for Content Settings.

There are also a number of policies whose names contain Default that control the default behavior for a given setting.

Here’s a list of Site Settings with information about their policies and behavior:

As you can see, some of these settings are very obscure (WebSerial, WebMIDI) while others will almost never be changed away from their defaults (Images).

-Eric

Specifying Per-Site Policy with Chromium’s URL Filter Format

Chromium-based browsers like Microsoft Edge make very limited use of Windows Security Zones. Instead, most permissions and features that offer administrators per-site configuration via policy rely on lists of rules in the URL Filter Format.

Filters are expressed in a syntax (Chrome Doc, Edge Doc) that is similar to other types of globbing rules, but different enough to cause confusion. For instance, consider a URLBlocklist rule expressed as follows:

These filters don’t work as expected. The HTTPS rule should not include a trailing * in the path component (it won’t match anything), while the data: rule requires a trailing * to function.

The syntax has a few other oddities as well:

  • You do not use a * to represent a part of a hostname: the * character is only used by itself to mean “ALL hosts”. A rule of *xample.com is invalid and does not match example.com.
  • The right way to express “Match example.com and its subdomains” is just example.com. If you want to match only the hostname example.com, and none of its subdomains, use .example.com (note the leading dot).
  • You may specify a path prefix (example.com/foo) but you must not include a wildcard * anywhere in the path
  • You may specify wildcards in a querystring (example.com?bar=*). You may omit the preceding path component to have that querystring checked on all pages, or include a path to only check the querystring on pages within the path.
  • A rule of blob:* doesn’t seem to match blob URLs, while a rule of data:* does seem to match all data URLs.

Unfortunately, there’s not a great debugger for figuring out the proper syntax. You can use the chrome://policy page to see whether Chrome finds any glaring error in the policy:

…but short of testing your policy there’s not a great way to verify it does what you hope.

Q: The problem of special-URLs

There are a variety of special URLs (particularly blob and data) that do not directly express a hostname– instead, the URL exists within a security context that is not included in the URL itself. This can cause problems for Policies if the code implementing the policy does not check the URL of the security context and looks only at the blob/data URL directly. A system administrator might set a policy for downloads from https://example.com/download, but if download page on that site uses a script-generated file download (e.g. a blob), the policy check might overlook the rule for example.com because it checks just the blob: URL.

An example bug can be found here.

Q: Can I block everything by adding * to the URLBlocklist?

You can add a simple * rule to the URL Blocklist, but then you must add to the URLAllowlist overriding rules to cover every URL that you need to allow to load in the browser. Beyond the https:// sites you expect, this includes, for example, about:, data:, edge:, and other URLs that you probably haven’t thought about.

Q: Can filters match on a site’s IP?

The Permissions system’s “Site Lists” feature does not support specifying an IP-range for allow and block lists. Wildcards are not supported either.

It does support specification of individual IP literals (e.g. http://127.0.0.1/), but such rules are only respected if the user navigates to the site using said literal IP address. If a non-address hostname is used (http://localhost), the IP Literal rule will not be respected even though the resolved IP of the host matches the filter-listed IP.

Aside: Wildcard support for IP-literals might be nice, so that an admin could specify e.g. http://192.168.* to exempt their private network. Unfortunately, Chromium couldn’t implement a syntax that is quite that simple — if it did, an attacker could just name their evil server https://192.168.evil.com and exploit their enhanced permissions.

Q: Can filters match just dotless hostnames?

Not today, no. You must individually list each desired hostname, e.g. (https://payroll, https://stock, https://who, etc).

Chromium’s URL Filter Format is convenient if your intranet is structured under one private domain (e.g. *.intranet.example.com) but is much less convenient if your Intranet uses dotless hostnames (http://example) or many disjoint private domains.

The ability to match only hostnames not containing dots would be convenient to accommodate the old IE behavior whereby Windows would map dotless hostnames to the Local Intranet Zone by default. (To my surprise, there’s been no significant demand for this capability in the first year of Edge’s existence, so perhaps corporate intranets are no longer using dotless hostnames very much?)

References