How Microsoft Edge Updates

When you see the update notifier in Edge (a green or red arrow on the … button):

… this means an update is ready for use and you simply need to restart the browser to have it applied.

While you’re in this state, if you open Edge’s application folder, you’ll see the new version sitting side-by-side with the currently-running version:

When you choose to restart:

…either via the prompt or manually, Edge will rename and restart with the new binaries and remove the old ones:

The new instance restarts using Chromium’s session restoration feature, so all of your tabs, windows, cookies, etc, are right where you left them before the update (akin to typing edge://restart in the omnibox).

This design means that the new version is ready to go immediately, without the need to wait for any downloads or other steps that could take a while or go wrong along the way. This is important, because users who don’t restart the browser will continue running the outdated version (even for new tabs or windows) until they restart, and this could expose them to security vulnerabilities.

Three Group Policies give administrators control of the relaunch process, including the ability to force a restart.


Technical Appendix

Chromium’s code for renaming the new_browser.exe binary can be seen here. When Chrome is installed at the machine-wide level, Chromium’s setup.exe is passed the --rename-chrome-exe command line switch, and its code performs the actual rename.

Attack Techniques: Spoofing via UserInfo

I received the following phishing lure by SMS a few days back:

The syntax of URLs is complicated, and even tech-savvy users often misinterpret them. In the case of the URL above, the actual site’s hostname is, and the misleading text is just a phony username:password pair making up the UserInfo component of the URL.

Because users aren’t accustomed to encountering urls with UserInfo, they often will assume that tapping this URL will load, which it certainly does not.

The Guidelines for Secure URL Display call for hiding the UserInfo data from UI surfaces where the user is expected to make a security decision (for example, the browser’s address bar/omnibox), and you’ll notice if you load this URL, the omnibox doesn’t show the spoofy portion. However, by the time that the user taps, the phisher likely has already successfully primed the user into expecting that the link is legitimate.

Test Links

Test Link:
Test Link:

If the page shows “Your browser made it!” without popping an authentication dialog, your browser automatically sent the credentials in response to the server’s HTTP/401.

Note that the UserInfo component of the URLs is visible in both NetLogs and browser extension events.

Browser Behavior

Nineteen years ago (April 2004), Internet Explorer 6 stopped supporting URLs containing userinfo, with the justification that this URI component wasn’t actually formally a part of the specification for HTTP/HTTPS URLs and it was primarily used for phishing. Last summer, RFC9110 made it official, suggesting:

Before making use of an "http" or "https" URI reference received from an untrusted source, a recipient SHOULD parse for userinfo and treat its presence as an error; it is likely being used to obscure the authority for the sake of phishing attacks.

The guidance goes on to note the risk of legitimately relying upon this URL syntax (it’s easy for the credentials to leak out due to bugs or careless handling).

In contrast to IE’s choice, Firefox went a different way, showing the user a modal prompt:

… which seems like a solid mitigation. However, the attacker can make the warning less scary by returning a HTTP/401 challenge, causing the text of the dialog to change to:

Chrome’s Security team reluctantly deems the acceptance of UserInfo as “Working as Intended.” While allowed for top-level navigations, Chromium disallows UserInfo in many niches, including the subresource fetches (which helps protects against a different class of attack). The crbug issue tracking that restriction includes some interesting conversation from folks encountering scenarios broken by the prohibition.

While it’s tempting to just disallow UserInfo everywhere (and I’d argue that all vendors probably should get RFC9110-compliant ASAP), it’s difficult to know how many real-world sites would break. Some browser vendors are probably reluctant to “go first” because in doing so, they might lose any inconvenienced users to a competitor that still allows the syntax. Just today, one security expert noted:

Ugh. Stay safe out there!


Going Electric – Solar

For years now, I’ve wanted to get solar panels for my house in Austin, both because it feels morally responsible and because I’m a geek and powering my house with carbon-free fusion seems neat.

Economically, I assume I’ll eventually break even with solar power, but probably not for a long time– my house isn’t large by Texas standards, and I use energy pretty efficiently. In August 2022, my monthly usage peaked at 1347 kilowatt hours:

I held off on installing solar for a long time because I was afraid that it was going to end up like LED lighting– I buy in, and then the tech improves rapidly, with costs dropping like a rock and efficiency improving every year. But I’ve gotten tired of waiting, and tired of being grumpy about every sunny day in the blistering Austin summer.

In selecting a solar provider, I ended up doing less due-diligence than I’d planned, but got a few recommendations from folks on Twitter and in my neighborhood, ultimately settling on a local company, Native Solar. I suspect they are far from the cheapest provider (e.g. Tesla Solar quoted panels for thousands less), but in reading the reviews of solar companies, many have horrible reviews for both installation and ongoing support, and I don’t need more hassle in my life.

I selected an 8Kw array, consisting of twenty panels and inverters:

The array is expected to generate 141% of my current power use, although I expect my power use will be higher in the future, thanks to my electric car and possible eventual switch to an induction stovetop and, possibly in a few years, a heat pump.

In Austin, solar power is sold to the grid at 9.5 cents per kWH, which is somewhat more than I pay for it, even at the “Tier 3” pricing you can see in my statement above.

The array was expensive, with total payments of $24900:

… but that doesn’t include a Federal tax credit of $7470 and a rebate of $2500 from my local power company (Austin Energy), for a net system cost of ~$15000.

Notably, I decided not to install a battery system. A 12kwh battery would have added delays and around $10K (after rebates) to the cost of the system, and with a service lifetime of just 10 years (the panels are expected to perform well for 25), that works out to be $1000 a year, every year, to handle any power outages. In my decade in Austin, significant power outages have been rare– in 2021’s big ice storm, I lost power for around twelve hours. In 2023’s enormous ice storm, I lost power for a very annoying fifty six hours and started to wonder if I’d made a mistake. (I’m hoping that one day, bidirectional power from cars will become more practical — my Nissan Leaf’s “puny” battery is 40kwh, but most of today’s electric cars don’t support acting as the house’s battery).

While I signed the contract for the solar install, I knew it was going to be a long process. My first payment was on September 1st of 2022, and the design wasn’t drawn until November. I quickly got it approved by the neighborhood home owners’ association, and Native Solar went through the process of getting the necessary approvals and permits from the power company and city.

At long last, on Wednesday (March 15th, 2023), the installers arrived to install the electrical boxes and the rails on my SW-facing roof:

After a rare rain interlude on Thursday, the installers returned on Friday to install the panels themselves. On the roof, the panels don’t look so big, but standing on the ground you can see how enormous they are:

After a few more hours work, the panels were all installed and hooked up:

Alas, the electrical panels are on the northeast side of the house, so there’s now a conduit that runs over the center of my roof:

Three new boxes were added left of the main panel and meter:

Alas, the big switch in the middle remains in the OFF position, as I’m not allowed to turn on the system until the City performs their final inspection of the installed system.

Hopefully they’ll get to it soon– I’m excited to see how much power I’m capturing!


Improving Native Message Host Reliability on Windows

Last Update: Mar 29, 2023

Previously, I’ve written about Chromium’s Native Messaging functionality that allows a browser extension to talk to a process running outside of the browser’s sandbox, and I shared a Native Messaging Debugger I wrote that allows you to monitor (and even tamper with) the communication channels between the browser extension and the Host App.

Obscure Problems on Windows

Native Messaging is a powerful capability, and a common choice for building extensions that need to interact with the rest of the system. However, over the years, users have reported a trail of bugs related to how the feature is implemented on Windows. While these bugs are typically only seen in uncommon configurations, they could break Native Messaging entirely for some users.

Some examples include:

  • crbug/335558 – Ampersand in Host’s path prevents launching
  • crbug/387228 – Broken if %comspec% not pointed at cmd.exe
  • crbug/387233 – Broken when cmd.exe is disabled or set to RUNASADMIN

While the details of each of these issues differ, they all have the same root cause: On Windows, Chromium did not launch Native Message Hosts directly, instead launching cmd.exe (Windows’ console command prompt) and directing it to launch the target Host:

This approach provided two benefits: it enabled developers to implement Hosts using languages like Python, whose scripts are not directly executable in Windows, and it enabled support for Windows XP, where the APIs did not allow Chromium to easily set up the communication channel between the browser and the Native Host.

Unfortunately, the cmd-in-the-middle design meant that anything that prevented cmd.exe from running (387233, 387228) or that prevented it from starting the Host (335558) would cause the flow to fail. While these configurations tend to be uncommon (which is why the problems have existed for ten years), they also tend to be very very hard to recognize/diagnose, and the impacted customers often have little recourse short of abandoning the extension platform.

The Fix

So, over a few nights and weekends, I landed a changelist in Chromium to improve this scenario for Chromium 113.0.5656 and later. This change means that Chrome, Edge (version 113.0.1769+), and other Chromium-derived browsers will now directly invoke any Native Host that is a Windows Executable (.exe) rather than going through cmd.exe instead:

This change will reach the Stable Channel of Chrome and Edge in the least week of April 2023.

Native Hosts that are not implemented by executables (e.g. Python scripts or the like) will continue to use the old codepath.

I’ve got my fingers crossed that effectively no one even notices this change, with the exception of those unfortunate users who were encountering the old bugs. However, this change also fixes two other bugs that were caused by the cmd-in-the-middle flow and those changes could cause problems if your Windows executable was not aware of the expected behavior for Native Hosts.


When Chromium launches a native host, it sets a start_hidden flag to prevent any UI from popping up from the host. That flag prevents the proxy cmd.exe‘s UI window (conhost.exe) from appearing on the screen. This start_hidden flag means that console-based (subsystem:console) Windows applications remain invisible during native-messaging communications. However, the start_hidden flag didn’t flow through to non-console applications (e.g. subsystem:Windows), like my Native Messaging Debugger application, which is built atop C#’s WinForms and meant to be seen by the user.

The new Direct Launch for Executables flow changes this– now Windows .exe files are started hidden, meaning that they’re not visible to the user by default. Surprisingly, this might not be obvious to the application’s code; for example, checking frmMain.Visible in my WinForms startup code still returned true even though the window was not visible.

Fixing this in my Host was simple— I just explicitly call ShowWindow in the form’s Load event handler:

private const int SW_SHOW = 5;
private static extern int ShowWindow(int hwnd, int nCmdShow);

ShowWindow((int)this.Handle, SW_SHOW);

The Terminator

When a Native Host is no longer needed, because the Extension’s sendNativeMessage() got a reply from the Host, or the disconnect() method was called (either explicitly or during garbage collection) on the port returned from connectNative(), Chromium shuts down the Native Host connection. First, it closes the stdin and stdout pipes that it established to communicate with the new process. Then, it checks whether the new process has exited itself (typical), and if not, sets up a timer to call Windows’ TerminateProcess() two seconds later if the Host is still running.

In the cmd.exe flow, this process termination was effectively a no-op, and a Host that did not self-terminate was always left running. You can see this with my Native Messaging Debugger app — the pipes close, but the UI remains alive.

In the new direct launch flow, the Host is reliably terminated two seconds after the pipes disconnect, if-and-only if Chromium is still running. (If the Host was disconnected because Chromium itself is exiting entirely, the Host process’s pipes are detached but not terminated… likely an existing bug where Chromium’s two-second callback is aborted during shutdown.)

While this is the intended design (preventing process leaks), unfortunately I’m not aware of an easy way for a Host that doesn’t want to exit to keep itself alive. Unlike many shutdown-type events, Windows does not allow a process to “decline” termination… it’s just there one moment and gone the next1. A Windows process can use a DACL trick to deny Process Terminate rights on handle non-elevated applications who get a handle to the process, but unfortunately this isn’t sufficient here, because Chromium gets a handle without this restriction as it launches the Host, before the Host process has a chance to protect itself. If your App truly needs to outlive the browser itself, you could either launch it via a .bat file, or you could have your Native Host itself be a stub that acts as an IPC proxy to the rest of your App.

If you encounter a scenario that visibly changes with this new flow in Chromium browsers v113.0.5656 and later, please let me know ASAP!

Update: Mar 22, 2023: The developer of a popular extension found another behavior change in the new codepath that caused their extension to unexpectedly stop working. It’s a subtle issue, and hopefully theirs is the only one that will hit it.

What happened?

With the new launch flow, if your Host has an outstanding Read() on the Standard Input (stdin) handle, if you attempt to close that handle:

// Don't do this!

…that function will now block unless/until the the Read() operation completes. If you were issuing this CloseHandle() call on the UI thread, your Host will hang until Chromium gets around to terminating your Host process, which could cause problems for your Host if it expected to perform any other cleanup after disconnecting.

The best fix for this issue is to simply not call CloseHandle(), because you don’t need to! All three STDIO handles will be correctly closed when your process exits in a few seconds anyway, so there’s no need to manually close the handle yourself.

If you really want to manually the handle, you can first call the function CancelIoEx(GetStdHandle(STD_INPUT_HANDLE), NULL); before calling CloseHandle(), but to reiterate, there’s really no good reason to bother closing the handle yourself.


1 It took me some time to actually figure out what was happening here. My Native Messaging Debugger app started disappearing after the pipes closed, and I didn’t know why. I assumed that an unhandled exception must be silently crashing my app. I finally figured out what was happening using the awesome Silent Process Exit debugger option inside gflags:

Attack Techniques: Open Redirectors, CAPTCHAs, Site Proxies, and IPFS, oh my

The average phishing site doesn’t live very long– think hours rather than days or weeks. Attackers use a variety of techniques to try to keep ahead of the Defenders who work tirelessly to break their attack chains and protect the public.

Defenders have several opportunities to interfere with attackers:

  • Email scanners can detect Lure emails and either block them entirely, or warn the user (e.g. Microsoft SafeLinks) if they click on a link in an email that leads to a malicious site. These email scanners might check embedded URLs by directly checking URL Reputation Services, or they might use Detonators, automated bots which try to navigate a virtual machine to the URLs contained within a Lure email to determine whether the user will end up on a malicious site.
  • Browsers themselves use URL Reputation Services (Microsoft SmartScreen, Google SafeBrowsing) to block navigations to URLs that have been reported as maliciously Requesting the victim’s credentials and/or Recording those stolen credentials.
  • Browser extensions (e.g. NetCraft, Suspicious Site Reporter) can warn the user if the site they’re visiting is suspicious in some way (newly, bad reputation, hosted in a “dodgy neighborhood”, etc).
  • Defenders can work with Certificate Authorities to revoke the HTTPS certificates of malicious sites (alas, this no longer works very well)
  • Defenders and Authorities work with web infrastructure providers (hosting companies, CDNs, domain registration authorities, etc) to take down malicious sites.

Each of these represents a weak link for attackers, and they can improve their odds by avoiding them as much as possible. For example, phishers can try to avoid URL Reputation services’ blocking entirely by sending Lures that trick users into completing their victimization over the phone. Or, they can try to limit their exposure to URL Reputation services by using the Lure to serve the credential Request from the victim’s own computer, so that only the url that Records the stolen credentials is a candidate for blocking.

To make their Lure emails’ URLs less suspicious to mail scanners, some phishers will not include a URL that points directly at the credential Request page, instead pointing at a Redirect URL. In some cases, that redirector is provided by a legitimate service, like Google or LinkedIn:

That first Redirect URL might itself link to another Redirect service; in some cases, a Cloaking Redirector might be used which tries to determine whether the visitor is a real person (potential victim) or a security scanning bot (Defender). If the Cloaking Redirector believes they’ve got a real bite, they’ll send them to the Credential Request page, but if not, they’ll instead send the bot to some innocuous other page (Google and Microsoft homepages are common choices).

Redirectors can also complicate the phish-reporting process: a user reporting a phishing site might not report the original URL, so when the credential Request page starts getting blocked, the attacker can just update the Redirect URL used in their lure to point to a new Request page.

Before showing the user the credential Request, an attacker might ask the user to complete a CAPTCHA. Now, you might naturally wonder “Why would an attacker ever put a hurdle in the way of the victim on their merry way to give up their secrets?” And the answer is simple: While CAPTCHAs make things slightly harder for human victims, they make things significantly harder for the Defender’s Detonators — if an automated security scanner can’t get to the final URL, it cannot evaluate its phishyness.

After the user has been successfully lured to a credential collection page, the attacker bears some risk: the would-be victim might report the phish to URL reputation services. To mitigate that risk, the attacker might rely on cloaking techniques, so that graders cannot “see” the phishing attack when they check the false negative report.

Similarly, the would-be victim might themselves report the URL directly to the phisher’s web host, who often has no idea that they’re facilitating a criminal enterprise.

To avoid getting their sites taken offline by hosting providers, attackers may split their attack across multiple servers, with the credential Request happening at one URL, and the user’s stolen data sent to be Recorded on another domain entirely. That way, if only Request URL is taken down, the attacker can still collect their plunder from the other domain.

An attack I saw today utilized several of these techniques all at once. The attacker sent a lure with a URL pointing to a Google-owned domain. That URL was itself just acting as a proxy for a Cloudflare IPFS gateway. IPFS is a new-ish technology that’s not supported by most browsers yet, but it has a huge benefit to attackers in that Authorities have no good way to “take down” content served via IPFS, although there’s a bad bits list.

To enable the attack page to be reachable by normal users’ browsers (which don’t natively support IPFS), the attackers supply a URL to a Cloudflare IPFS gateway, a special webservice that allows browsers to retrieve IPFS content using plain-old HTTPS. In this case, neither Google nor Cloudflare recognizes that they’re facilitating the attack, as neither of them is really acting as a “Web server” in any traditional sense.

Even if Google Translate and Cloudflare eventually do block the malicious URLs, the attacker can easily pick a different proxy service and a different IPFS gateway, without even having to republish their attack elsewhere on IPFS. The design of IPFS makes it harder to ever discover who’s behind the malicious page.

Now, storing data back to IPFS is a somewhat harder challenge for attackers, so this phishing site uses a different server for that purpose. The “KikiCard” URL used by the attackers receives POST requests with victims’ credentials, stores those credentials into a database for the attacker, and then redirects the user to some generic error page on In most cases, victims will never even see the “KikiCard” URL anywhere, making it much less likely to be reported.

Google SafeBrowsing is now blocking the KikiCard host as malicious, but it’s still online with a valid certificate.

Without more research, I usually couldn’t tell you whether this domain has always been owned by attackers, or whether an attacker simply hacked into an innocent web server and started using it for nefarious purposes. In this case, however, a quick search shows that it was found as a Recorder of stolen credentials going back to July 2022, not long after it got its first HTTPS certificate.


Slow Seaside Half

After my first real-world half marathon in January, I ended up signing up for the 2024 race, but I also quickly decided that I didn’t want to wait a full year to give it another shot. A day or so later, I signed up for the Galveston Island Half Marathon at the end of February, with the hope that a similarly flat course would give me a shot at beating my Austin finishing time.

Alas, it wasn’t to be, although I’m still glad I ran it.

The weather forecast bounced around a bit in the final weeks leading up to the race, with rain predicted for a while, but race morning ultimately proved to be free of precipitation but extremely humid.

I woke up for half an hour at 3:15am, which wasn’t ideal, but I didn’t feel very tired. This time, I had a productive trip to the bathroom before leaving the house, and managed to squeeze in a final coffee disposal in the porta-potties just before the start.

In pre-race prep, I’d added more “peppy” music to my playlist, and configured my watch for easier visibility, although infuriatingly, I couldn’t coax it to tell me the time of day or total elapsed time: for my next run, I’m going to wear two watches.

The course started on Stewart Beach…

…heading north before looping back and passing by the starting area around 9.5 miles later:

Unfortunately, this run was hard. I never found my rhythm and ended up in my Peak heart rate zone almost immediately; after mile three, I was regularly dropping down to walks.

I ended up not needing my sunglasses (or sunscreen), and it was kinda nice to run alongside the foggy beach and surf. That said, I needed water or Gatorade at almost every aid stop and I think I pumped out more sweat than on any other run.

My pace for the first six miles was considerably slower than my expected (8:34), and only fell from there:

The middle miles of the race were hard. While nothing hurt for more than a second or two (a budding blister made its presence known, but it wasn’t either a surprise or bothersome), nothing felt very good either. I again found myself lost in unhappy thoughts and worries (mostly loneliness) and never managed to “zone out” and just run like I do on the treadmill.

When the finish line was finally in sight, I started sprinting; my knees instantly warned me that this wasn’t going to last, but otherwise it felt great to finally be moving.

I crossed the line fourteen minutes slower than my Austin Half, happy to be done:

After a shower back at the AirBnB, friends and I went to the Galveston Island Brewing taproom and sampled their beers. After a few hours, I walked over to the beach to enjoy the sun and warm weather (the fog had dissipated).

“Math Is Hard” Double IPA. (Or was it a quad, since I had two? :)

By the end of the day, I’d walked almost 6 additional miles, crossing over 35000 steps for the day.

The long-sleeve race shirt was pretty nice, and the logo was the same one used for the finisher’s medal.

Unfortunately, landscapers with a mower destroyed the back window of my car while it was parked at the AirBnB, but I managed to get it back to Austin without the shattered glass completely falling out.

I’m looking forward to some recovery treadmill runs for the next two months before the Capital 10K in April. I had a relaxed 8 mile run this morning and it felt great.


Q: “Remember this Device, Doesn’t?!?”

Q: Many websites offer a checkbox to “Remember this device” or “Remember me” but it often doesn’t seem to work. For example, this option on AT&T’s website shown when prompting for a 2FA code:

…doesn’t seem to work. What’s up with that?

A: Unfortunately, there’s no easy answer here. There is no browser standard for how to implement a feature like this, so different websites implement it differently.

Virtually all of these systems are dependent upon storing some sort of long-lived token within one of the browser’s storage areas (cookies, DOM storage, IndexedDB, etc). Anything which interferes with your browser’s storage areas can interfere with the long-lived token:

  • Depending upon how the site is coded, privacy features like Edge’s Tracking Prevention might interfere with storage of the token to begin with.
  • There are many different features and operations that can cause one or more storage items to subsequently be become inaccessible. For example, privacy controls, 3rd party utilities, user-actions, use of multiple browser channels, and so on. (Please see the blog post for a more comprehensive list).

Even if the token is successfully stored by the website and is available on later site loads, the server might choose to ignore it.

  • Some sites will ignore a cached token if the visitor appears to be coming from a significantly different geographic location, e.g. because you’ve either moved your laptop or enabled a VPN.
  • Some sites will ignore a cached token if some element of the user’s environment changes: for instance, if the browser’s configured languages are different than when the token was stored.
  • We encountered one site whose auth flow broke if the browser’s User-Agent string changed– this site broke when we tried to fix a compatibility issue by automatically overriding the User-Agent value.
  • Some sites will expire a cached token after a certain (often undocumented) timeframe.
  • Some sites will expire a cached token if some other security setting in the account is changed, or if there are signs that the account’s login is under bruce-force attack.
  • Some sites simply change how they work over time. For example, Fidelity recently sent an email to customers with 2FA announcing that they’ll no longer respect a “remember this device” option:
  • Some sites will expire a cached token if some other risk heuristic triggers (e.g. a user begins logging in at an unusual time of day, etc).


Debugging problems like this is often non-trivial, but you might try things like:

  • Watch the F12 Developer Tools’ console to look for any notes about storage being blocked by a browser privacy feature, or a JavaScript exception.
  • See if the “Remember me” behavior works once from the same browser instance.
  • See if the “Remember me” behavior works after restarting the browser.
  • See if the “Remember me” behavior works properly in a different browser or channel.
  • Poke through the F12 Developer Tools’ Application tab to see what sorts of Storage the site’s login flow is writing.

Attack Techniques: Blended Attacks via Phone

Last month, we looked at a technique where a phisher serves his attack from the user’s own computer so that anti-phishing code like SmartScreen and SafeBrowsing do not have a meaningful URL to block.

Another approach for conducting an attack like this is to send a lure which demands that the victim complete the attack out-of-band using a telephone. Because the data theft is not conducted over the web, URL reputation systems don’t have anything to block.

Here’s an example of such a scam, which falsely claims that the user was charged $400 for one of the free programs already on their PC:

The attacker hopes that the user, upon seeing this charge, will call the phone number within the email and get tricked into supplying sensitive information. This particular scam’s phone number is routed to a call center purporting to be “Microsoft Support.”

Evidence suggests that some email services have gotten wise to this scam: because the phone number needs only be read by a human, attackers may try to evade detection and blocking by encoding their phone numbers using non-digit characters or irregular formatting, as in this lure:

…or by embedding the phone number inside an image, like this lure:

Unfortunately, relatively few phones offer any mechanism for warning the user when they’re calling a known-scam number — Google’s “Scam Likely” warnings only seem to show on the Pixel for inbound calls. As with traditional phishing attacks, bad actors can usually switch their infrastructure easily after they are blocked.

Stay safe out there!


PS: Sometimes this attack technique is lumped in with vishing, but I tend to think of vishing as an attack in which the initial lure arrives via a phone call or voicemail.

A New Era: PM -> SWE

tl;dr: As of last week, I am now a Software Engineer at Microsoft.

My path to becoming a Program Manager at Microsoft was both unforeseen (by me) and entirely conventional. Until my early teens, my plan was to be this guy:

I went to Space Camp and Space Academy, and spent years devouring endless books about NASA history, space flight, and jet planes. I spent hours “playing” on a realistic (not graphically, but in terms of slow pacing and technical accuracy) Space Shuttle simulator, until I could land the shuttle on instruments alone.

Over time, however, three factors conspired to change my course.

  • First was my realization that my few peers interested in space were all interested in space — stars and planets and the science, while I really only cared about the technology of getting there and surviving.
  • Second was the discovery of a Catch-22: While astronaut pilots don’t have to have perfect vision, they were required to have thousands of hours of experience flying jets, which practically required being a military jet pilot, which did require perfect uncorrected vision. My distance vision was ~20/40.
  • Finally, I’d started getting more and more interested in playing around with computers. I began writing “choose-your-own adventure” games in GW-BASIC starting around age 8 or so, and continued coding in school on Apple II (AppleBasic) and PCs (Logo, Pascal).

Shortly after my 15th birthday, I spent a full summer job’s earnings (~$3000 at $4.75/hr) on my first personal PC (Comtrade Pentium 90 PC with a whopping 8 megs of RAM, 730mb HDD, 4X CDROM, 15.7″ monitor, bought over the telephone from an ad in Computer Shopper magazine) and I started writing apps in Turbo Pascal, VB3 (bought for $50 on 5.25″ floppies at the annual “Computer show” at the Frederick Fairgrounds), and eventually Delphi 1 ($100 at Babbages in the mall). By my late teens, I was spending ten or more (sometimes much more) hours a week writing code, and after my senior year, I got my first programming job building custom Windows apps in Delphi for a small development shop at almost 4x minimum wage.

After high school, I majored in Computer Science at the University of Maryland, and while I largely didn’t like it (too much theory, too little practice), I had already seen that software development was a pretty solid career choice. In my sophomore year, on a whim (with the promise of free pizza) I went to a Microsoft recruiting talk on campus delivered by Philip Su, a recent University of Maryland graduate who had joined Microsoft as a developer. Philip was a school legend, having written UMD’s web-based course planning system (a CGI written in C++ talking to the mainframe and spitting out HTML) that allowed you to specify constraints like “I need this many credits, these specific classes, and otherwise do not want to attend class before 11am on any day.” After Philip’s awesome talk, I went from being mildly interested in Microsoft to very excited at the prospect of getting an internship. I dropped off my resume, chatted briefly with Philip, and crossed my fingers.

I got a callback for a short interview at the campus career center a short time later. I didn’t really know what to expect, but figured my best bet was to show off the code I’d built so far. I put together a small binder of screenshots and explanations of tools I’d built in Delphi, including SlickRun, DigitalMC, and Logbook, a journaling program. Each of these was a “scratch my own itch” type of app where my goal was to use technology to solve a problem. In each app, I tried to build cool features, not implement fancy algorithms from scratch. Digital MC used several different libraries (text-to-speech, MP3 playback) and Logbook used an existing database engine.

My campus interviewer was a Microsoft developer in his early thirties (in hindsight, he may well have been younger) who looked a bit weary after a morning full of 15 minute interviews. After quick introductions, he asked which of the engineering roles I’d be most interested in applying for.

I told him that I thought I’d be a fine fit for any of the roles, although I was most interested in the SDE (Software Development Engineer) and PM (Program Manager) roles, and was interested in what he thought. I handed over the binder and walked him through the projects I’d built— as I explained SlickRun, his eyes lit up and he was clearly excited about it. “Have you ever shown this to Microsoft?” he asked excitedly. “I guess I just did?” I replied, wondering what exactly he meant— it wasn’t as if Microsoft toured the country looking for interesting bits of code. I asked him for advice on whether I should go for the PM or SDE role and he noted that Microsoft was looking for SDE interns with experience building 5000 line C and C++ programs. At that point, I’d built several large applications, but all were in Delphi’s Object Pascal. The only C and C++ I’d written was for class projects, and none of those had yet cracked a thousand lines. This made the decision easy— I’d submit my resume as a PM-candidate, a decision with far-ranging and long-lasting consequences. Not long after, I flew to Redmond for a day of on-site interviews with two teams in Office and got offers from both.

During my first Office summer internship in 1999, I ramped up on a new technology (devouring the first books on XML), wrote up competitive reports on the first web-based collaboration software, and played with the nascent API for our team’s “Office Web Server (OWS)” product (eventually renamed SharePoint Team Services). I attended a bunch of training classes, read a bunch of product specs, read a pile of usability books, and generally immersed myself in learning what it meant to be a Program Manager at Microsoft. At the time, the role was hand-wavingly defined as “The person who does everything but code and test.” Qualifications were similarly open, with recruiters told to look for candidates with “A passion for using technology to solve problems.”

I returned to the same team the following summer– by this point, the product was in much more defined form, and I was paired with an Intern Developer and Intern Tester (a “feature trio”) to build a feature. Over the course of the summer, I learned that the primary tasks for most PMs were writing feature design specifications, shepherding them through implementation, triaging bugs found in the implementation, and getting ready for release.

SharePoint was a product based on the idea of Lists (lists of documents, lists of links, lists of contacts, etc) and my intern trio was tasked with adding a feature whereby a SharePoint user could create a list based on pre-built templates with appropriate fields (e.g. the Contact list would have fields for email address, phone number, office address, etc, etc). I wrote the spec for how the feature should look, and for the packaging format that would define each template. I also wrote (in Delphi) a generator/packager app to allow a content team (initially me) to build template files in the correct format. Our dev intern (Brandon?) wrote the C++ code that would run inside SharePoint to ingest the package and call the appropriate APIs to create the new list. Our tester (Matt?) made sure it all worked. We finished our feature before the 12 week internship was up, and I considered it an unqualified success.

Offered a full-time job after the internship, I went back to Redmond for a perfunctory day of interviews with the team and was greatly annoyed to learn that our internship’s Template feature was unceremoniously cut from the release. That outcome, as well as the lack of challenging interview questions from the team, led to me surprising everyone (including myself) by deciding to switch teams. I chose to join the Office Update team, then responsible for all of the Office web sites.

During my senior year back at UMD, I had a work/study internship as a web developer at The Motley Fool, and wrote a primitive OS in C++ for CS412. After finally crossing that “5000 lines of C++” threshold that Microsoft was looking for, I still didn’t seriously consider moving over to SDE. I was already “in” as a PM, and from my internship, it felt like there was a greater opportunity for impact as a PM vs. SDE — most of the SDE interns only owned a tiny piece of a product even if it took a ton of work (ensuring accessibility, globalization, localization, performance, security, etc, etc) to deliver that tiny piece. As a PM, I’d be able to direct the work of several developers and focus on maximizing the value of their work for our users. To be honest, being a 21 year-old PM felt a bit like using a “cheat code”– when I’d interviewed at IBM they were super-confused at my resume because at Big Blue, a PM was a grizzled developer who’d “moved up” after a decade of coding. But at Microsoft, I’d get to start there.

The Office Update team had reorganized, so in June 2001, I started on the Office Assistance and Worldwide Services team, as the PM owner of the clipart website and as the team’s Security PM. I spent the three years on Office writing feature specs, triaging bugs, and generally doing “everything but writing code.”

Except… well, I wrote a lot of code. I wrote “Rip Art Gallery,” a tool for abusing the Office website’s API to download clipart without requiring an Office app, and wrote a proof-of-concept ActiveX control for a new feature. I wrote the Clip of the Day tool, to allow Content team to generate the XML manifests of which clip to feature in which locales, on each day for the upcoming months. I wrote webserver log analysis tools. I wrote TamperIE, a tool designed to exploit websites that failed to validate request data, and accidentally leaked it to the world.

Outside of work, I wrote a popular popup blocker (and a less popular one), continued to update SlickRun, maintained DigitalMC and Logbook, created MezerTools, wrote some simple IE Extensions, wrote some simple Delphi libraries (including two for CD-R burning), started building the Fiddler Web Debugger and Meddler, and otherwise acted like a developer. Nearly all of my code was written in Delphi, C#, or JavaScript, with my only C++ development being tiny tweaks to the Internet JunkBuster Proxy to convert it into a bare-bones HTTP traffic logger.

Every few months, my manager would ask “Are you sure you’re not a developer?” and I would demur and explain that I simply loved being a PM. Privately, I also worried that I might lose interest in my many side projects if I started writing code for work.

By the fall of 2004, I decided to move on from Office and join the Internet Explorer team. The newly reconstituted browser team was rapidly growing, and they were hungrier for SDEs than PMs, so the devs on my interview loop were eager to get me to jump disciplines. Unwilling to change both teams and roles at the same time, I remained a PM. Internet Explorer offered more opportunity to become a technical PM though, and I rapidly leaned into it, owning both the new consolidated URL (CURL) class as well as much of the networking and network security areas.

I also immediately embarked upon my barely secret mission — to figure out what bugs in Internet Explorer were responsible for the problem where the Office Clip-of-the-Day wasn’t reliably changing every day. (My futile queries to the skeleton IE team were how I encountered the “Want to change the world? Join the new IE team today” recruiting pitch). With my newly granted source code access permissions, I printed out the code for the WinINET network stack and read it at night with a red pen in hand. While I was not a C++ developer, I was reasonably competent as a C++ reader, and I flagged nearly a hundred bugs, including six different issues that would’ve caused the Clip-of-the-Day to fail to change.

When I’d first joined the IE team, my manager suggested that I find someone else to take over development of Fiddler, because I’d “be too busy.” “We’ll see” I replied, thinking “Your entire test team are all going to be running Fiddler pretty soon.” I continued to spend tens of hours a week writing Fiddler code, late into the night and on weekends, and its audience grew and grew. In 2007, it won the Engineering Excellence award and I got a handshake from Bill Gates and $5000 to spend on a morale event. While Fiddler dominated my coding time, I still maintained SlickRun and built a few one-off utilities, including an ActiveX control that earned me a $500 steak dinner with friends at Daniel’s Broiler, and an IE extension that won me $3000 in furniture from Pottery Barn and Crate&Barrel. Perhaps my most lucrative win came when a new hire was assigned to “officially productize” a simple web app I’d written to generate IE Search Providers; we started dating and were married three years later.

After several years languishing in the PM2 level band, I finally broke into the Senior PM band on the recognition of my technical contributions. I could go toe-to-toe with the developers in triage conversations, often knowing the code as well as they did, and I built many reduced reproductions for bugs, sometimes explaining exactly what lines of code were at fault.

Toward the end of IE9, I was deeply interested in improving network performance, but I lamented that the dev team couldn’t muster the resources to fix a dozen performance bugs in the network cache code. As I explained the changes needed and how impactful they could be, one of our developers (Ed Praitis) listened thoughtfully and then quietly noted: “It seems like you understand this stuff pretty well. Why don’t you just fix it yourself?

I chuckled until I saw he was serious. “But I’m a PM!” I protested, “we don’t check-in code. At least, nothing like this.”

I’ll review it for you if you want,” he offered. And this was just the push I needed. Within a few weeks, I checked in my fixes, and it was the work I was most proud of in over a decade at the company… helping save hundreds of millions of users untold billions of seconds in downloading pages. Around that time, I also offered up a small change to the WinINET code to make it work better with Fiddler, and to my surprise (and amusement) that team accepted it.

After a decade, I’d started to get a bit burned out on the PM role, and fresh off the excitement of landing actual shipping product code, I pondered whether I could take the pay hit of down-leveling to become a junior SDE. Instead, team turnover intervened, and I became a PM Lead, with my four reports owning IE’s Security, Privacy, Reliability, Telemetry, Extensibility, and Process Model features. Despite my rather untraditional PM background, I was, apparently, going to continue my career in a PM Leadership role.

And then, I got an email. A developer tools company was interested in acquiring Fiddler, and I, looking at a full plate with “a real job,” a new wife, and plans for a baby within a few years, decided that the booming Fiddler project deserved a full-time team. I got deep into negotiations to sell Fiddler outright when a phone call from a second interested party threw everything aside. Telerik not only wanted to buy Fiddler, they also wanted me to come work on Fiddler for them, from Austin, Texas. The financial terms were more generous, and the lower cost-of-living in Texas meant that we’d only need one income. After a blissful March visit and negotiations over the summer, I signed the papers and we both gave notice at Microsoft.

At Telerik, my job title in the address book fluctuated around as the company grew and evolved and I never paid it much attention– whether it was “Principal Software Engineer” or “Product Manager” or something else, I considered myself “Fiddler Product Owner” and I did all the jobs, from coding to user research to support to design to testing. Once in a while, I’d consult on Telerik’s other products, but I never wrote any meaningful code for them.

Alas, after two years and a big pre-IPO layoff of nearly everyone else in the building, I was no longer feeling stable at Telerik and I applied for a Developer Advocate role on the Chrome Security team in 2015. Google is amazeballs at many things, but hiring is not one of them. I completed the Developer Advocate interview loop but their hiring committee came back and suggested that I should be a Technical Program Manager. I did a TPM interview loop, but their hiring committee came back and suggested I should be a Developer Advocate. The lead of Chrome Security decided to resolve the deadlock by hiring me as a Senior SWE (Software Engineer), for which she had sole authority. Since I’d be reporting directly to her, she assured me, my actual duties would be unchanged and my address book title would make no difference. With significant trepidation (I always worried about anything “off book”) I agreed.

I had a very strange ramp-up at Google, with paternity leave after my second son was born in week 2, and a subsequent long bout with pneumonia. Within a few months of starting, a reorganization meant that I’d now start reporting to a new manager. “My new boss knows that I’m not really a SWE and I’m really this special unicorn, right?!?” I asked my director, and was assured the answer was “Yes.” I then went to confirm with my new boss: “You know I’m not a SWE, right? I’ve only written like two files of C++ in the last fifteen years. I’m really this special unicorn DevAdvocate.” She responded “Well, um, I don’t actually have any special unicorn jobs on my team. I do have a SWE job, however, and you do have a Senior SWE title, so we should see if it’s a good fit, right?”

As a father of now two and provider for a single-income family, I didn’t see a lot of options. I looked into down-leveling so my skills matched my role, but Google HR indicated that wasn’t an option, both because they didn’t allow down-leveling and because they didn’t allow remote employees below the Senior level. I spent a total of two and a half years barely keeping my head above water, landing 94 changelists in Chromium and learning a ton. I joked without joking that I was the worst developer in Chrome. While there was much to admire about how Google builds products, I lamented the lack of Microsoft-style PMs and always wondered how much more efficient the team would’ve been with a proper complement of Program Managers.

In 2018, when I saw that one of my former reports was now a Group Program Manager at Microsoft, I asked for a job and was delighted to learn that remote work was now possible at the “new Microsoft.” I came back as a Principal Program Manager, and twice ended up acting as an interim lead for a few months as the team turned over. As a PM on the “Web Platform” and as one of the only Edge employees with any experience in Chromium, I got to remain hyper-technical, spending the majority of my time reading specs, guiding designs, explaining engineering systems, reading code, reducing repros, and root-causing problems.

As the team ramped up on Chromium, Microsoft as a whole began a journey to redefine the Program Management role, eventually splitting the role into Product Management (PdM) and Technical Program Management (TPM) to match Google. It was not a graceful process, and many of us felt a great deal of angst at the change. The 2012 book How Google Tests Software had presaged Microsoft’s earlier messy implosion of its Software Test Engineer role, and now it seemed that Microsoft was looking to continue its Googlification and eventually phase out the PM role entirely.

Throughout 2021, I found myself hunting for useful work to do. I spent almost a year as an “enterprise fixer”, landing 168 changelists in Chromium — most of them quite small, and targeted at unblocking enterprises from deploying the new Edge. I again pondered down-leveling to switch disciplines, with perhaps even higher stakes, having ceded half my net worth in a divorce and with the stock market suffering wild gyrations daily.

Finally, in 2022, I took the leap, leaving the Edge team to rejoin old friends and colleagues on the Microsoft Security team responsible for SmartScreen and other security features across products. I spent a few months ramping up into the new technologies, looking at active attacks, and reviewing the code the team has built so far. I kept the “Principal Product Manager” title as a placeholder, with the promise of a reclassification to “Architect” at some point in the future, a spiffy-sounding title that feels like a good fit to encompass the sorts of contributions I like to make.

In conversations with my lead last week, we agreed that “PM” was no longer a good fit for the work I’ll be doing in the coming years, so as of Friday, I’m now a “Principal SWE Manager.” While I don’t think any title has ever been a particularly good fit for the breadth of work I do, I’m excited to try this one on.


Appendix: So, What Did PMs Do, Anyway?

When I first published this post, I felt unsatisfied because I think most folks who weren’t at Microsoft in the late 1990s and early 2000’s probably don’t have a clear idea of what the Microsoft PMs of my era actually did. That’s partly because PM was a fairly broad title covering a lot of different activities, and partly because not every PM performed every type of task.

Generally, however, a model PM would do many of the following things:

  • Research and deeply understand customer problems.
  • Analyze and deeply understand current competitive solutions.
  • Brainstorm approaches to fix those problems and validate the proposals. Doing this effectively requires a comprehensive understanding of the capabilities of available technology (both hardware and software).
  • Design great experiences to delight customers. In high-visibility flows, PMs will often have the help of dedicated writers, graphic designers, and usability researchers. However, those resources are often very limited, so a PM should be prepared to put together a shippable design without subject-matter-expert help, and obtain feedback to improve the design before the product ships.
  • Make good tradeoffs and build consensus: whether it’s prioritizing feature investments, triaging bugs, or figuring out what dinner to order for folks staying late at the office.
  • Communicate effectively, both narrowly (1:1 emails, small group meetings, etc) and broadly (blog posts, standards bodies, conference talks). This often involved translating between the varying jargon and interests of different audiences.
  • Reduce Ambiguity. Even when a decision hasn’t yet been made or there’s not enough data, PMs work to ensure that everyone (dev, test, support, leadership, partner teams, etc) is on the same page about both the plan and the known unknowns.
  • Be the Scribe. Any decision that has been made should be recorded (along with supporting data). Outstanding action items should be recorded and driven to closure.

None of these tasks are forbidden to Software Engineers, of course, but SWEs are expected to be world-class experts in writing code, a huge domain and a full-time job all its own.