Google Chrome–One Year In

Four weeks ago, emailed notice of a free massage credit revealed that I’ve been at Google for a year. Time flies when you’re drinking from a firehose.

When I mentioned my anniversary, friends and colleagues from other companies asked what I’ve learned while working on Chrome over the last year. This rambling post is an attempt to answer that question.

Non-maskable Interrupts

While I started at Google just over a year ago, I haven’t actually worked there for a full year yet. My second son (Nate) was born a few weeks early, arriving ten workdays after my first day of work.

I took full advantage of Google’s very generous twelve weeks of paternity leave, taking a few weeks after we brought Nate home, and the balance as spring turned to summer. In a year, we went from having an enormous infant to an enormous toddler who’s taking his first steps and trying to emulate everything his 3 year-old brother (Noah) does.

Baby at the hospitalFirst birthday cake

I mention this because it’s had a huge impact on my work over the last year—much more than I’d naively expected.

When Noah was born, I’d been at Telerik for almost a year, and I’d been hacking on Fiddler alone for nearly a decade. I took a short paternity leave, and my coding hours shifted somewhat (I started writing code late at night between bottle feeds), but otherwise my work wasn’t significantly impacted.

As I pondered joining Google Chrome’s security team, I expected pretty much the same—a bit less sleep, a bit of scheduling awkwardness, but I figured things would fall into a good routine in a few months.

Things turned out somewhat differently.

Perhaps sensing that my life had become too easy, fate decided that 2016 was the year I’d get sick. Constantly. (Our theory is that Noah was bringing home germs from pre-school; he got sick a bunch too, but recovered quickly each time.) I was sick more days in 2016 than I was in the prior decade, including a month-long illness in the spring. That ended with a bout of pneumonia that concluded with a doctor-mandated seven days away from the office. As I coughed my brains out on the sofa at home, I derived some consolation in thinking about Google’s generous life insurance package. But for the most part, my illnesses were minor—enough to keep me awake at night and coughing all day, but otherwise able to work.

Mathematically, you might expect two kids to be twice as much work as one, but in our experience, it hasn’t worked out that way. Instead, it varies between 80% (when the kids happily play together) to 400% (when they’re colliding like atoms in a runaway nuclear reactor). Thanks to my wife’s heroic efforts, we found a workable daytime routine. The nights, however, have been unexpectedly difficult. Big brother Noah is at an age where he usually sleeps through the night, but he’s sure to wake me up every morning at 6:30am sharp. Fortunately, Nate has been a pretty good sleeper, but even now, at just over a year old, he usually still wakes up and requires attention twice a night or so.

I can’t remember the last time I had eight hours of sleep in a row. And that’s been extremely challenging… because I can’t remember much else either. Learning new things when you don’t remember them the next day is a brutal, frustrating process.

When Noah was a baby, I could simply sleep in after a long night. Even if I didn’t get enough sleep, it wouldn’t really matter—I’d been coding in C# on Fiddler for a decade, and deadlines were few and far between. If all else failed, I’d just avoid working on any especially gnarly code and spend the day handling support requests, updating graphics, or doing other simple and straightforward grunt work from my backlog.

Things are much different on Chrome.

Roles

When I first started talking to the Chrome Security team about coming aboard, it was for a role on the Developer Advocacy team. I’d be driving HTTPS adoption across the web and working with big sites to unblock their migrations in any way I could. I’d already been doing the first half of that for fun (delivering talks at conferences like Codemash and Velocity), and I’d previously spent eight years as a Security Program Manager for the Internet Explorer team. I had tons of relevant experience. Easy peasy.

I interviewed for the Developer Advocate role. The hiring committee kicked back my packet and said I should interview as a Technical Program Manager instead.

I interviewed as a Technical Program Manager. The hiring committee kicked back my packet and said I should interview as a Developer Advocate instead.

The Chrome team resolved the deadlock by hiring me as a Senior Software Engineer (SWE).

I was initially very nervous about this, having not written any significant C++ code in over a decade—except for one in-place replacement of IE9’s caching logic which I’d coded as a PM because I couldn’t find a developer to do the work. But eventually I started believing in my own pep talk: “I mean, how hard could it be, right? I’ve been troubleshooting code in web browsers for almost two decades now. I’m not a complete dummy. I’ll ramp up. It’ll be rough, but it’ll work out. Hell, I started writing Fiddler not knowing either C# nor HTTP, and that turned out pretty good. I’ll buy some books and get caught up. There’s no way that Google would have just hired me as a C++ developer without asking me any C++ coding questions if it wasn’t going to all be okay. Right? Right?!?”

The Firehose

I knew I had a lot to learn, and fast, but it took me a while to realize just how much else I didn’t know.

Google’s primary development platform is Linux, an OS that I would install every few years, play with for a day, then forget about. My new laptop was a Mac, a platform I’d used a bit more, but still one for which I was about a twentieth as proficient as I was on Windows. The Chrome Windows team made a half-hearted attempt to get me to join their merry band, but warned me honestly that some of the tooling wasn’t quite as good as it was on Linux and it’d probably be harder for me to get help. So I tried to avoid Windows for the first few months, ordering a puny Windows machine that took around four times longer to build Chrome than my obscenely powerful Linux box (with its 48 logical cores). After a few months, I gave up on trying to avoid Windows and started using it as my primary platform. I was more productive, but incredibly slow builds remained a problem for a few months. Everyone told me to just order another obscenely powerful box to put next to my Linux one, but it felt wrong to have hardware at my desk that collectively cost more than my first car—especially when, at Microsoft, I bought all my own hardware. I eventually mentioned my cost/productivity dilemma to a manager, who noted I was getting paid a Google engineer’s salary and then politely asked me if I was just really terrible at math. I ordered a beastly Windows machine and now my builds scream. (To the extent that any C++ builds can scream, of course. At Telerik, I was horrified when a full build of Fiddler slowed to a full 5 seconds on my puny Windows machine; my typical Chrome build today still takes about 15 minutes.)

Beyond learning different operating systems, I’d never used Google’s apps before (Docs/Sheets/Slides); luckily, I found these easy to pick up, although I still haven’t fully figured out how Google Drive file organization works. Google Docs, in particular, is so good that I’ve pretty much given up on Microsoft Word (which headed downhill after the 2010 version). Google Keep is a low-powered alternative to OneNote (which is, as far as I can tell, banned because it syncs to Microsoft servers) and I haven’t managed to get it to work well for my needs. Google Plus still hasn’t figured out how to support pasting of images via CTRL+V, a baffling limitation for something meant to compete in the space… hell, even Microsoft Yammer supports that, for gods sake. The only real downside to the web apps is that tab/window management on modern browsers is still a very much unsolved problem (but more on that in a bit).

But these speedbumps all pale in comparison to Gmail. Oh, Gmail. As a program manager at Microsoft, pretty much your entire life is in your inbox. After twelve years with Outlook and Exchange, switching to Gmail was a train wreck. “What do you mean, there aren’t folders? How do I mark this message as low priority? Where’s the button to format text with strikethrough? What do you mean, I can’t drag an email to my calendar? What the hell does this Archive thing do? Where’s that message I was just looking at? Hell, where did my Gmail tab even go—it got lost in a pile of sixty other tabs across four top-level Chrome windows. WTH??? How does anyone get anything done?”

Communication and Remote Work

While Telerik had an office in Austin, I didn’t interact with other employees very often, and when I did they were usually in other offices. I thought I had a handle on remote work, but I really didn’t. Working with a remote team on a daily basis is just different.

With communication happening over mail, IRC, Hangouts, bugs, document markup comments, GVC (video conferencing), G+, and discussion lists, it was often hard to figure out which mechanisms to use, let alone which recipients to target. Undocumented pitfalls abounded (many discussion groups were essentially abandoned while others were unexpectedly broad; turning on chat history was deemed a “no-no” for document retention reasons).

It often it took a bit of research to even understand who various communication participants were and how they related to the projects at hand.

After years of email culture at Microsoft, I grew accustomed to a particular style of email, and Google’s is just different. Mail threads were long, with frequent additions of new recipients and many terse remarks. Many times, I’d reply privately to someone on a side thread, with a clarifying question, or suggesting a counterpoint to something they said. The response was often “Hey, this just went to me. Mind adding on the main thread?

I’m working remotely, with peers around the world, so real-time communication with my team is essential. Some Chrome subteams use Hangouts, but the Security team largely uses IRC.

XKCD comic on IRC

Now, I’ve been chatting with people online since BBSes were a thing (I’ve got a five digit ICQ number somewhere), but my knowledge of IRC was limited to the fact that it was a common way of taking over suckers’ machines with buffer overflows in the ‘90s. My new teammates tried to explain how to IRC repeatedly: “Oh, it’s easy, you just get this console IRC client. No, no, you don’t run it on your own workstation, that’d be crazy. You wouldn’t have history! You provision a persistent remote VM on a machine in Google’s cloud, then SSH to that, then you run screens and then you run your IRC client in that. Easy peasy.

Getting onto IRC remained on my “TODO” list for five months before I finally said “F- it”, installed HexChat on my Windows box, disabled automatic sleep, and called it done. It’s worked fairly well.

Google Developer Tooling

When an engineer first joins Google, they start with a week or two of technical training on the Google infrastructure. I’ve worked in software development for nearly two decades, and I’ve never even dreamed of the development environment Google engineers get to use. I felt like Charlie Bucket on his tour of Willa Wonka’s Chocolate Factory—astonished by the amazing and unbelievable goodies available at any turn. The computing infrastructure was something out of Star Trek, the development tools were slick and amazing, the process was jaw-dropping.

While I was doing a “hello world” coding exercise in Google’s environment, a former colleague from the IE team pinged me on Hangouts chat, probably because he’d seen my tweets about feeling like an imposter as a SWE.  He sent me a link to click, which I did. Code from Google’s core advertising engine appeared in my browser. Google’s engineers have access to nearly all of the code across the whole company. This alone was astonishing—in contrast, I’d initially joined the IE team so I could get access to the networking code to figure out why the Office Online team’s website wasn’t working. “Neat, I can see everything!” I typed back. “Push the Analyze button” he instructed. I did, and some sort of automated analyzer emitted a report identifying a few dozen performance bugs in the code. “Wow, that’s amazing!” I gushed. “Now, push the Fix button” he instructed. “Uh, this isn’t some sort of security red team exercise, right?” I asked. He assured me that it wasn’t. I pushed the button. The code changed to fix some unnecessary object copies. “Amazing!” I effused. “Click Submit” he instructed. I did, and watched as the system compiled the code in the cloud, determined which tests to run, and ran them. Later that afternoon, an owner of the code in the affected folder typed LGTM (Googlers approve changes by typing the acronym for Looks Good To Me) on the change list I had submitted, and my change was live in production later that day. I was, in a word, gobsmacked. That night, I searched the entire codebase for misuse of an IE cache control token and proposed fixes for the instances I found. I also narcissistically searched for my own name and found a bunch of references to blog posts I’d written about assorted web development topics.

Unfortunately for Chrome Engineers, the introduction to Google’s infrastructure is followed by a major letdown—because Chromium is open-source, the Chrome team itself doesn’t get to take advantage of most of Google’s internal goodies. Development of Chrome instead resembles C++ development at most major companies, albeit with an automatically deployed toolchain and enhancements like a web-based code review tool and some super-useful scripts. The most amazing of these is called bisect-builds, and it allows a developer to very quickly discover what build of Chrome introduced a particular bug. You just give it a “known good” build number and a “known bad” build number and it  automatically downloads and runs the minimal number of builds to perform a binary search for the build that introduced a given bug:

Console showing bisect builds running

Firefox has a similar system, but I’d’ve killed for something like this back when I was reproducing and reducing bugs in IE. While it’s easy to understand how the system functions, it works so well that it feels like magic. Other useful scripts include the presubmit checks that run on each change list before you submit them for code review—they find and flag various style violations and other problems.

Compilation itself typically uses a local compiler; on Windows, we use the MSVC command line compiler from Visual Studio 2015 Update 3, although work is underway to switch over to Clang. Compilation and linking all of Chrome takes quite some time, although on my new beastly dev boxes it’s not too bad. Googlers do have one special perk—we can use Goma (a distributed compiler system that runs on Google’s amazing internal cloud) but I haven’t taken advantage of that so far.

For bug tracking, Chrome recently moved to Monorail, a straightforward web-based bug tracking system. It works fairly well, although it is somewhat more cumbersome than it needs to be and would be much improved with a few tweaks. Monorail is open-source, but I haven’t committed to it myself yet.

For code review, Chrome presently uses Rietveld, a web-based system, but this is slated to change in the near(ish) future. Like Monorail, it’s pretty straightforward although it would benefit from some minor usability tweaks; I committed one trivial change myself, but the pending migration to a different system means that it isn’t likely to see further improvements.

As an open-source project, Chromium has quite a bit of public documentation for developers, including Design Documents. Unfortunately, Chrome moves so fast that many of the design documents are out-of-date, and it’s not always obvious what’s current and what was replaced long ago. The team does value engineers’ investment in the documents, however, and various efforts are underway to update the documents and reduce Chrome’s overall architectural complexity. I expect these will be ongoing battles forever, just like in any significant active project.

What I’ve Done

“That’s all well and good,” my reader asks, “but what have you done in the last year?”

I Wrote Some Code

My first check in to Chrome landed in February; it was a simple adjustment to limit Public-Key-Pins to 60 days. Assorted other checkins trickled in through the spring before I went on paternity leave. The most fun fix I did cleaned up a tiny UX glitch that sat unnoticed in Chrome for almost a decade; it was mostly interesting because it was a minor thing that I’d tripped over for years, including back in IE. (The root cause was arguably that MSDN documentation about DWM lied; I fixed the bug in Chrome, sent the fix to IE, and asked MSDN to fix their docs).

I fixed a number of minor security bugs, and lately I’ve been working on UX issues related to Chrome’s HTTPS user-experience. Back in 2005, I wrote a blog post complaining about websites using HTTPS incorrectly, and now, just over a decade later, Chrome and Firefox are launching UI changes to warn users when a site is collecting sensitive information on pages which are Not Secure; I’m delighted to have a small part in those changes.

Having written a handful of Internet Explorer Extensions in the past, I was excited to discover the joy of writing Chrome extensions. Chrome extensions are fun, simple, and powerful, and there’s none of the complexity and crashes of COM.

My 3 Chrome Extensions

My first and most significant extension is the moarTLS Analyzer– it’s related to my HTTPS work at Google and it’s proven very useful in discovering sites that could improve their security. I blogged about it and the process of developing it last year.

Because I run several different Chrome instances on my PC (and they update daily or weekly), I found myself constantly needing to look up the Chrome version number for bug reports and the like. I wrote a tiny extension that shows the version number in a button on the toolbar (so it’s captured in screenshots too!):

Show Chrome Version screenshot

More than once, I spent an hour or so trying to reproduce and reduce a bug that had been filed against Chrome. When I found out the cause, I’d jubilently add my notes to the issue in the Monorail bug tracker, click “Save changes” and discover that someone more familiar with the space had beaten me to the punch and figured it out while I’d had the bug open on my screen. Adding an “Issue has been updated” alert to the bug tracker itself seemed like the right way to go, but it would require some changes that I wasn’t able to commit on my own. So, instead I built an extension that provides such alerts within the page until the feature can be added to the tracker itself.

Each of these extensions was a joy to write.

I Filed Some Bugs

I’m a diligent self-hoster, and I run Chrome Canary builds on all of my devices. I submit crash reports and file bugs with as much information as I can. My proudest moment was in helping narrow down a bizarre and intermittent problem users had with Chrome on Windows 10, where Chrome tabs would crash on every startup until you rebooted the OS. My blog post explains the full story, and encourages others to file bugs as they encounter them.

I Triaged More Bugs

I’ve been developing software for Windows for just over two decades, and inevitably I’ve learned quite a bit about it, including the undocumented bits. That’s given me a leg up in understanding bugs in the Windows code. Some of the most fun include issues in Drag and Drop, like this gem of a bug that means that you can’t drop files from Chrome to most applications in Windows. More meaningful bugs relate to problems with Windows’ Mark-of-the-Web security feature (about which I’ve blogged about several times).

I Took Sheriff Rotations

Google teams have the notion of sheriffs—a rotating assignment that ensures that important tasks (like triaging incoming security bugs) always has a defined owner, without overwhelming any single person. Each Sheriff has a term of ~1 week where they take on additional duties beyond their day-to-day coding, designing, testing, etc.

The Sheriff system has some real benefits—perhaps the most important of which is creating a broad swath of people experienced and qualified in making triage decisions around security vulnerabilities. The alternative is to leave such tasks to a single owner, rapidly increasing their bus factor and thus the risk to the project. (I know this from first-hand experience. After IE8 shipped, I was on my way out the door to join another team. Then IE’s Security PM left, leaving a gaping hole that I felt obliged to stay around to fill. It worked out okay for me and the team, but it was tense all around.)

I’m on two sheriff rotations: Enamel (my subteam) and the broader Chrome Security Sheriff.

The Enamel rotation’s tasks are akin to what I used to do as a Program Manager at Microsoft—triage incoming bugs, respond to questions in the Help Forums, and generally act as a point of contact for my immediate team.

In contrast, the Security Sheriff rotation is more work, and somewhat more exciting. The Security Sheriff’s duties include triaging all bugs of type “Security”, assigning priority, severity, and finding an owner for each. Most security bugs are automatically reported by our fuzzers (a tireless robot army!), but we also get reports from the public and from Chrome team members and Project Zero too.

At Microsoft, incoming security bug reports were first received and evaluated by the Microsoft Security Response Center (MSRC); valid reports were passed along to the IE team after some level of analysis and reproduction was undertaken. In general, all communication was done through MSRC, and the turnaround cycle on bugs was typically on the order of weeks to months.

In contrast, anyone can file a security bug against Chrome, and every week lots of people do. One reason for that is that Chrome has a Vulnerability Rewards program which pays out up to $100K for reports of vulnerabilities in Chrome and Chrome OS. Chrome paid out just under $1M USD in bounties last year. This is an awesome incentive for researchers to responsibly disclose bugs directly to us, and the bounties are much higher than those of nearly any other project.

In his “Hacker Quantified Security” talk at the O’Reilly Security conference, HackerOne CTO and Cofounder Alex Rice showed the following chart of bounty payout size for vulnerabilities when explaining why he was using a Chromebook. Apologies for the blurry photo, but the line at the top shows Chrome OS, with the 90th percentile line miles below as severity rises to Critical:

Vulnerability rewards by percentile. Chrome is WAY off the chart.

With a top bounty of $100000 for an exploit or exploit chain that fully compromises a Chromebook, researchers are much more likely to send their bugs to us than to try to find a buyer on the black market.

Bug bounties are great, except when they’re not. Unfortunately, many filers don’t bother to read the Chrome Security FAQ which explains what constitutes a security vulnerability and the great many things that do not. Nearly every week, we have at least one person (and often more) file a bug noting “I can use the Developer Tools to read my own password out of a webpage. Can I have a bounty?” or “If I install malware on my PC, I can see what happens inside Chrome” or variations of these.

Because we take security bug reports very seriously, we often spend a lot of time on what seem like garbage filings to verify that there’s not just some sort of communication problem. This exposes one downside of the sheriff process—the lack of continuity from week to week.

In the fall, we had one bug reporter file a new issue every week that was just a collection of security related terms (XSS! CSRF! UAF! EoP! Dangling Pointer! Script Injection!) lightly wrapped in prose, including screenshots, snippets from websites, console output from developer tools, and the like. Each week, the sheriff would investigate, ask for more information, and engage in a fruitless back and forth with the filer trying to figure out what claim was being made. Eventually I caught on to what was happening and started monitoring the sheriff’s queue, triaging the new findings directly and sparing the sheriff of the week. But even today we still catch folks who lookup old bug reports (usually Won’t Fixed issues), copy/paste the content into new bugs, and file them into the queue. It’s frustrating, but coming from a closed bug database, I’d choose the openness of the Chrome bug database every time.

Getting ready for my first Sherriff rotation, I started watching the incoming queue a few months earlier and felt ready for my first rotation in September. Day One was quiet, with a few small issues found by fuzzers and one or two junk reports from the public which I triaged away with pointers to the “Why isn’t a vulnerability” entries in the Security FAQ. I spent the rest of the day writing a fix for a lower-priority security bug that had been filed a month before. A pretty successful day, I thought.

Day Two was more interesting. Scanning the queue, I saw a few more fuzzer issues and one external report whose text started with “Here is a Chrome OS exploit chain.” The report was about two pages long, and had a forty-two page PDF attachment explaining the four exploits the finder had used to take over a fully-patched Chromebook.

Star Wars trench run photo

Watching Luke’s X-wing take out the Death Star in Star Wars was no more exciting than reading the PDF’s tale of how a single byte memory overwrite in the DNS resolver code could weave its way through the many-layered security features of the Chromebook and achieve a full compromise. It was like the most amazing magic trick you’ve ever seen.

I hopped over to IRC. “So, do we see full compromises of Chrome OS every week?” I asked innocently.

“No. Why?” came the reply from several corners. I pasted in the bug link and a few moments later the replies started flowing in “OMG. Amazing!” Even guys from Project Zero were impressed, and they’re magicians who build exploits like this (usually for other products) all the time. The researcher had found one small bug and a variety of neglected components that were thought to be unreachable and put together a deadly chain.

The first patches were out for code review that evening, and by the next day, we’d reached out to the open-source owner of the DNS component with the 1-byte overwrite bug so he could release patches for the other projects using his code. Within a few days, fixes to other components landed and had been ported to all of the supported versions of Chrome OS. Two weeks later, the Chrome Vulnerability rewards team added the reward-100000 tag, the only bug so far to be so marked. Four weeks after that, I had to hold my tongue when Alex mentioned that “no one’s ever claimed that $100000 bounty” during his “Hacker Quantified Security” talk. Just under 90 days from filing, the bug was unrestricted and made available for public viewing.

The remainder of my first Sheriff rotation was considerably less exciting, although still interesting. I spent some time looking through the components the researcher had abused in his exploit chain and filed a few bugs. Ultimately, the most risky component he used was removed entirely.

Outreach and Blogging

Beyond working on the Enamel team (focused on Chrome’s security UI surface), I also work on the “MoarTLS” project, designed to help encourage and assist the web as a whole in moving to HTTPS. This takes a number of forms—I help maintain the HTTPS on Top Sites Report Card, I do consultations and HTTPS Audits with major sites as they enable HTTPS on their sites. I discover, reduce, and file bugs on Chrome’s and other browsers’ support of features like Upgrade-Insecure-Requests. I publish a running list of articles on why and how sites should enable TLS. I hassle teams all over Google (and the web in general) to enable HTTPS on every single hyperlink they emit. I responsibly disclosed security bugs in a number of products and sites, including a vulnerability in Hillary Clinton’s fundraising emails. I worked to send a notification to many many many thousands of sites collecting user information non-securely, warning them of the UI changes in Chrome 56.

When I applied to Google for the Developer Advocate role, I expected I’d be delivering public talks constantly, but as a SWE I’ve only given a few talks, including my  Migrating to HTTPS talk at the first O’Reilly Security Conference. I had a lot of fun at that conference, catching up with old friends from the security community (mostly ex-Microsofties). I also went to my first Chrome Dev Summit, where I didn’t have a public talk (my colleagues did) but I did get to talk to some major companies about deploying HTTPS.

I also blogged quite a bit. At Microsoft, I started blogging because I got tired of repeating myself, and because our Exchange server and document retention policies had started making it hard or impossible to find old responses—I figured “Well, if I publish everything on the web, Google will find it, and Internet Archive will back it up.”

I’ve kept blogging since leaving Microsoft, and I’m happy that I have even though my reader count numbers are much lower than they were at Microsoft. I’ve managed to mostly avoid trouble, although my posts are not entirely uncontroversial. At Microsoft, they wouldn’t let me publish this post (because it was too frank); in my first month at Google, I got a phone call at home (during the first portion of my paternity leave) from a Google Director complaining that I’d written something that was too harsh about a change Microsoft had made. But for the most part, my blogging seems not to ruffle too many feathers.

Tidbits

  • Food at Google is generally really good; I’m at a satellite office in Austin, so the selection is much smaller than on the main campuses, but the rotating menu is fairly broad and always has at least three major options. And the breakfasts! I gained about 15 pounds in my first few months, but my pneumonia took it off and I’ve restrained my intake since I came back.
  • At Microsoft, I always sneered at companies offering free food (“I’m an adult professional. I can pay for my lunch.”), but it’s definitely convenient to not have to hassle with payments. And until the government closes the loophole, it’s a way to increase employees’ compensation without getting taxed.
  • For the first three months, I was impressed and slightly annoyed that all of the snack options in Google’s micro-kitchens are healthy (e.g. fruit)—probably a good thing, since I sit about twenty feet from one. Then I saw someone open a drawer and pull out some M&Ms, and I learned the secret—all of the junk food is in drawers. The selection is impressive and ranges from the popular to the high end.
  • Google makes heavy use of the “open-office concept.” I think this makes sense for some teams, but it’s not at all awesome for me. I’d gladly take a 10% salary cut for a private office. I doubt I’m alone.
  • Coworkers at Google range from very smart to insanely off-the-scales-smart. Yet, almost all of them are humble, approachable, and kind.
  • Google, like Microsoft, offers gift matching for charities. This is an awesome perk, and one I aim to max out every year. I’m awed by people who go far beyond that.
  • Window Management – I mentioned earlier that one downside of web-based tools is that it’s hard to even find the right tab when I’ve got dozens of open tabs that I’m flipping between. The Quick Tabs extension is one great mitigation; it shows your tabs in a searchable, most-recently-used list in a convenient dropdown:

QuickTabs Extension

Another trick that I learned just this month is that you can instruct Chrome to open a site in “App” mode, where it runs in its own top-level window (with no other tabs), showing the site’s icon as the icon in the Windows taskbar. It’s easy:

On Windows, run chrome.exe –app=https://mail.google.com

While on OS X, run open -n -b com.google.Chrome –args –app=’https://news.google.com

Tip: The easy way to create a shortcut to a the current page in app mode is to click the Chrome Menu > More Tools > Add to {shelf/desktop} and tick the Open as Window checkbox.

I now have SlickRun MagicWords set up for mail, calendar, and my other critical applications.


So, that’s it for year one @Google. I’m feeling lucky as I head into year two!

-Eric

Google Chrome–One Year In

Certified Malice

One unfortunate (albeit entirely predictable) consequence of making HTTPS certificates “fast, open, automated, and free” is that both good guys and bad guys alike will take advantage of the offer and obtain HTTPS certificates for their websites.

Today’s bad guys can easily turn a run-of-the-mill phishing spoof:

HTTP Phish screenshot

…into a somewhat more convincing version, by obtaining a free “domain validated” certificate and lighting up the green lock icon in the browser’s address bar:

HTTPS Phish screenshotPhishing site on Android

The resulting phishing site looks almost identical to the real site:

Real and fake side-by-side

By December 8, 2016, LetsEncrypt had issued 409 certificates containing “Paypal” in the hostname; that number is up to 709 as of this morning. Other targets include BankOfAmerica (14 certificates), Apple, Amazon, American Express, Chase Bank, Microsoft, Google, and many other major brands. LetsEncrypt validates only that (at one point in time) the certificate applicant can publish on the target domain. The CA also grudgingly checks with the SafeBrowsing service to see if the target domain has already been blocked as malicious, although they “disagree” that this should be their responsibility. LetsEncrypt’s short position paper is worth a read; many reasonable people agree with it.

The “race to the bottom” in validation performed by CAs before issuing certificates is what led the IE team to spearhead the development of Extended Validation certificates over a decade ago. The hope was that, by putting the CAs name “on the line” (literally, the address line), CAs would be incentivized to do a thorough job vetting the identity of a site owner. Alas, my proposal that we prominently display the CAs name for all types (EV, OV, DV) of certificate wasn’t implemented, so domain validated certificates are largely anonymous commodities unless a user goes through the cumbersome process of manually inspecting a site’s certificates. For a number of reasons (to be explored in a future post), EV certificates never really took off.

Of course, certificate abuse isn’t limited to LetsEncrypt—other CAs have also issued domain-validated certificates to phishing sites as well:

Comodo cert for a Paypal Phish

Who’s Responsible?

Unfortunately, ownership of this mess is diffuse, and I’ve yet to encounter any sign like this:

image

Blame the Browser

The core proposition held by some (but not all) CAs is that combatting malicious sites is the responsibility of the user-agent (browser), not the certificate authority. It’s an argument with merit, especially in a world where we truly want encryption for all sites, not just the top sites.

That position is bolstered by the fact that some browsers don’t actively check for certificate revocation, so even if LetsEncrypt were to revoke a certificate, the browser wouldn’t even notice.

Another argument is that browsers overpromise the safety of sites by using terms like Secure in the UI—while the browser can know whether a given HTTPS connection is present and free of errors, it has no knowledge of the security of the destination site or CDN, nor its business practices.  Internet Explorer’s HTTPS UX used to have a helpful “Should I trust this site?” link, but that content went away at some point. Security wording is a complicated topic because what the user really wants to know (“Is this safe?”) isn’t something a browser can ever really answer in the affirmative. Users tend to be annoyed when you tell them only the truth– “This download was not reported as not safe.”

The obvious way to address malicious sites is via phishing and malware blocklists, and indeed, you can help keep other users safe by reporting any unblocked phish you find to the Safe Browsing service; this service protects Chrome, Firefox, and Safari users. You can also forward phishing messages to scam@netcraft.com and/or PhishTank. Users of Microsoft browsers can report unblocked phish to SmartScreen (in IE, click Tools > SmartScreen > Report Unsafe Website). Known-malicious sites will get the UI treatment they deserve:

image

Unfortunately, there’s always latency in block lists, and a phisher can probably turn a profit with a site that’s live less than one hour. Phishers also have a whole bag of tricks to delay blocks, including cloaking whereby they return an innocuous “Site not found” message when they detect that they’re being loaded by security researchers’ IP addresses, browser types, OS languages, etc.

Blame the Websites

Some argue that websites are at fault, for:

  1. Relying upon passwords and failing to adopt unspoofable two-factor authentication schemes which have existed for decades
  2. Failing to adopt HTTPS or deploy it properly until browsers started bringing out the UI sledgehammers
  3. Constantly changing domain names and login UIs
  4. Emailing users non-secure links to redirector sites
  5. Providing bad security advice to users

Blame the Humans

Finally, many lay blame with the user, arguing user education is the only path forward. I’ve long given up much hope on that front—the best we can hope for is raising enough awareness that some users will contribute feedback into more effective systems like automated phishing block lists.

We’ve had literally decades of sites and “experts” telling users to “Look for the lock!” when deciding whether a site is to be trusted. Even today we have bad advice being advanced by security experts who should know better, like this message from the Twitter security team which suggests that https://twitter.com.access.info is a legitimate site.

Email from Twitter Security Team

Where Do We Go From Here?

Unfortunately, I don’t think there are any silver bullets, but I also think that unsolvable problems are the most interesting ones. I’d argue that everyone who uses or builds the web bears some responsibility for making it safer for all, and each should try to find ways to do that within their own sphere of control.

Following is an unordered set of ideas that I think are worthwhile.

Signals

Historically, we in the world of computers have been most comfortable with binary – is this site malicious, or is it not? Unfortunately, this isn’t how the real world usually works, and fortunately many systems are moving more toward a set of signals. When evaluating the trustworthiness of a site, there are dozens of available signals, including:

  • HTTPS Certificates
  • Age of the site
  • Has the user visited this site before
  • Has the user stored a password on this site before
  • Has the user used this password before, anywhere
  • Number of visitors observed to the site
  • Hosting location of the site
  • Presence of sensitive terms in the content or URL
  • Presence of login forms or executable downloads

None of these signals is individually sufficient to determine whether a site is good or evil, but combined together they can be used to feed systems that make good sites look safer and evil sites more suspect. Suspicious sites can be escalated for human analysis, either by security experts or even by crowd-sourced directed questioning of ordinary users. For instance, see the heuristic-triggered Is This Phish? UI from Internet Explorer’s SmartScreen system:

image

The user’s response to the prompt is itself a signal that feeds into the system, and allows more speedy resolution of the site as good or bad and subject to blocking.

One challenge with systems based on signals is that, while they grow much more powerful as more endpoints report signals, those signal reports may have privacy implications for individual users.

Reputation

In the real world, many things are based on reputation—without a good reputation, it’s hard to get a job, stay in business, or even find a partner.

In browsers, we have a well-established concept of bad reputation (your site or download appears on a block list) but we’ve largely pushed back against the notion of good reputation. In my mind, this is a critical failing—while much of the objection is well-meaning (“We want a level playing field for everyone”), it’s extremely frustrating that we punish users in support of abstract ideals. Extended Validation certificates were one attempt at creating the notion of good reputation, but they suffered from the whims of CAs’ business plans and the legal system that drove absurdities like:

Security chip showing full legal name of Washington Post's holding company

While there are merits to the fact that, on the web, no one can tell whether your site is a one-woman startup or a Fortune 100 behemoth, there are definite downsides as well.

Owen Campbell-Moore wrote a wonderful paper, Rethinking URL bars as primary UI, in which he proposes a hypothetical origin-to-local-brand mapping, which would allow browsers to help users more easily understand where they really are when interacting with top sites. I think it’s a long-overdue idea brilliant in its obviousness. When it eventually gets built (by Apple, Microsoft, Google, or some upstart?) we’ll wonder what took us so long. A key tradeoff in making such a system practical is that designers must necessarily focus on the “head” of the web (say, the most popular 1000 websites globally)– trying to generalize the system for a perfectly level playing field isn’t practical– just like in the real world.

Assorted

  • Browsers could again make a run at supporting unspoofable authentication methods natively.
  • Browsers could do more to take authentication out from the Zone of Death, limiting spoofing attacks.
  • New features like Must-Staple will allow browsers to respond to certificate revocation information without a performance penalty.
  • Automated CAs could deploy heuristics that require additional validation for certificates containing often-spoofed domains.For instance, if I want a certificate for paypal-payments.com, they could either demand additional information about me personally (allowing a visit from Law Enforcement if I turn out to be a phisher) or they could even just issue a certificate with a validity period a week in the future. The certificate would be recorded in CertificateTransparency logs immediately, allowing brand monitoring firms to take note and respond immediately if phishing is detected when the certificate became valid. Such systems will be imperfect (you need to handle BankOfTheVVest.com as a potential spoof of BankOfTheWest.com, for instance) but even targeting a few high-value domains could make a major dent.

 

Fight the phish!

 

-Eric

Update: Chris wrote a great post on HTTPS UX in Chrome.

Certified Malice

The Line of Death

When building applications that display untrusted content, security designers have a major problem— if an attacker has full control of a block of pixels, he can make those pixels look like anything he wants, including the UI of the application itself. He can then induce the user to undertake an unsafe action, and a user will be none the wiser.

In web browsers, the browser itself usually fully controls the top of the window, while pixels under the top are under control of the site. I’ve recently heard this called the line of death:

Line of death below omnibox

If a user trusts pixels above the line of death, the thinking goes, they’ll be safe, but if they can be convinced to trust the pixels below the line, they’re gonna die.

Unfortunately, this crucial demarcation isn’t explicitly pointed out to the user, and even more unfortunately, it’s not an absolute.

For instance, because the area above the LoD is so small, sometimes more space is needed to display trusted UI. Chrome attempts to resolve this by showing a little chevron that crosses the LoD:

Chrome chevrons

…because untrusted markup cannot cross the LoD. Unfortunately, as you can see in the screenshot, the treatment is inconsistent; in the PageInfo flyout, the chevron points to the bottom of the lock and the PageInfo box overlaps the LoD, while in the Permission flyout the chevron points to the bottom of the omnibox and the Permission box only abuts the LoD. Sometimes, the chevron is omitted, as in the case of Authentication dialogs.

Alas, the chevron is subtle, and I expect most users will fall for a faked chevron, like some sites have started to use1:

Fake chevron in HTML

The bigger problem is that some attacker data is allowed above the LoD; while trusting the content below the LoD will kill your security, there are also areas of death above the line. A more accurate Zones of Death map might look like this:

Zones of Death

In Zone 1, the attacker’s chosen icon and page title are shown. This information is controlled fully by the attacker and thus may consist entirely of deceptive content and lies.

In Zone 2, the attacker’s domain name is shown. Some information security pros will argue that this is the only “trustworthy” component of the URL, insofar as if the URL is HTTPS then the domain correctly identifies the site to which you’re connected. Unfortunately, your idea of trustworthy might be different than the experts’; https://paypal-account.com/ may really be the domain you loaded, but it has no relationship with the legitimate payment service found at https://paypal.com.

The path component of the URL in Zone 3 is fully untrustworthy; the URL http://account-update.com/paypal.com/ has nothing to do with Paypal either, and while spoofing here is less convincing, it also may be harder for the good guys to block because the spoofing content is not found in DNS nor does it create any records in Certificate Transparency logs.

Zone 4 is the web content area. Nothing in this area is to be believed. Unfortunately, on windowed operating systems, this is worse than it sounds, because it creates the possibility of picture-in-picture attacks, where an entire browser window, including its trusted pixels, can be faked:

Paypal window is fake content from evil.example.com

When hearing of picture-in-picture attacks, many people immediately brainstorm defenses; many related to personalization. For instance, if you run your OS or browser with a custom theme, the thinking goes, you won’t be fooled. Unfortunately, there’s evidence that that just isn’t the case.

Story time

Back in 2007 as the IE team was launching Extended Validation (EV) certificates, Microsoft Research was publishing a paper calling into question their effectiveness. A Fortune 500 financial company came to visit the IE team as they evaluated whether they wanted to go into the EV Certificate Authority business. They were excited about the prospect (as were we, since they were a well-known-name with natural synergies) but they noted that they thought the picture-in-picture problem was a fatal flaw.

I was defensive– “It’s interesting,” I conceded, “but I don’t think it’s a very plausible attack.”

They retorted “Well, we passed this screenshot around our entire information security department, and nobody could tell it’s a picture-in-picture attack. Can you?” they slid an 8.5×11 color print across the table.

“Of course!” I said, immediately relieved. I quickly grew gravely depressed as I realized the implications of the fact that they couldn’t tell the difference.

“How?” they demanded.

“It’s a picture of an IE7 browser running on Windows Vista in the transparent Aero Glass theme with a page containing a JPEG of an IE7 browser running on Windows XP in the Luna aka Fisher Price theme?” I pointed out.

“Oh. Huh.” they noted.

My thoughts of using browser personalization as an effective mitigation died that day.

Other mitigations were proposed; one CA built an extension where hovering over the EV Lock Icon (“Trust Badge”) would dim the entire screen except for the badge. One team proposed using image analysis to scan the current webpage for anything that looked like a fake EV badge.

Personally, my favorite approach was Tyler Close’s idea that the browser should use PetNames for site identity– think of them as a Gravatar icon for salted certificate hashes– not only would they make every HTTPS site’s identity look unique to each user, but this could also be used as a means of detecting fraudulent or misissued certificates (in a world before we had certificate transparency).

The Future is Here … and It’s Worse

HTML5 adds a Fullscreen API, which means the Zone of Death looks like this:zodfullscreen

The Metro/Immersive/Modern mode of Internet Explorer in Windows 8 suffered from the same problem; because it was designed with a philosophy of “content over chrome”, there were no reliable trustworthy pixels. I begged for a persistent trustbadge to adorn the bottom-right of the screen (showing a security origin and a lock) but was overruled. One enterprising security tester in Windows made a visually-perfect spoofing site of Paypal, where even the user gestures that displayed the ephemeral browser UI were intercepted and fake indicators were shown. It was terrifying stuff, mitigated only by the hope that no one would use the new mode.

Virtually all mobile operating systems suffer from the same issue– due to UI space constraints, there are no trustworthy pixels, allowing any application to spoof another application or the operating system itself. Historically, some operating systems have attempted to mitigate the problem by introducing a secure user gesture (on Windows, it’s Ctrl+Alt+Delete) that always shows trusted UI, but such measures tend to confuse users (limiting their effectiveness) and often get “optimized away” when the UX team’s designers get ahold of the product.

It will be interesting to see how WebVR tries to address this problem on an even grander scale.

Beyond Browsers

Of course, other applications have the concept of a LoD as well, including web applications. Unfortunately, they often get this wrong. Consider Outlook.com’s rendering of an email:

image

When Outlook has received an email from a trusted sender, they notify the user via a “This message is from a trusted sender.” notice. Which appears directly inside a Zone of Death:

image

Security UI is hard.

-Eric

1 “Why would they fake a permission prompt? What would they gain?” you ask? Because for a real permission prompt, if you click Block,they can never ask you again, while with a fake prompt, they can spam you as much as they like. On the other hand, if you click Allow, they immediately present the real prompt.

The Line of Death

Client Certificates on Android

Recently, this interesting tidbit crossed my Twitter feed:

Tweet: "Your site asks for a client certificate?"

Sure enough, if you visited the site in Chrome, you’d get a baffling prompt.

My hometown newspaper shows the same thing:

No Certificates Found prompt on Android

Weird, huh?

Client certificates are a way for a browser to supply a certificate to the server to verify the client’s identity (in the typical case, a HTTPS server only sends its certificate so that the client can validate that the server is what it says it is.

In the bad old days of IE6 and IE7, the default behavior was to show a similar prompt, but what’s going on with the latest Chrome on modern Android?

It turns out that this is a limitation of the Android security model, to which Chrome on Android is subject. In order for Chrome to interact with the system’s certificate store, the operating system itself shows a security prompt.

If your server has been configured to accept client certificates (in either require or want mode), you should be sure to test it on Android devices to verify that it behaves as you expect for your visitors (most of whom likely will not have any client certificates to supply).

-Eric

Client Certificates on Android

HTTPS Only Works If You Use It – Tipster Edition

Convoy with three armored tanks and one pickup truck

It’s recently become fashionable for news organizations to build “anonymous tip” sites that permit members of the public to confidentially submit tips about stories of public interest.

Unfortunately, would-be tipsters need to take great care when exploring such options, because many organizations aren’t using HTTPS properly to ensure that the user’s traffic to the news site is protected from snoopers on the network.

If the organization uses any non-secure redirections in loading its “Tips” page, or the page pulls any unique images or other content over a non-secure connection, the fact that you’ve visited the “Tips” page will be plainly visible to your ISP, employer, fellow coffee shop patron, home-router-pwning group, etc.

NYTimes call for Tips, showing non-secure redirects

The New Yorker Magazine call for Tips, showing non-secure redirects

Here are a few best practices for organizations that either a) anonymous tips online or b) use webpages to tell would-be leakers how to send anonymous tips via Tor or non-electronic means:

For end users:

  • Consider using Tor or other privacy-aiding software.
  • Don’t use a work PC or any PC that may have spyware or non-public certificate roots installed.

Stay private out there!

-Eric

HTTPS Only Works If You Use It – Tipster Edition

Security UI in Chrome

The combined address box and search bar at the top of the Chrome window is called the omnibox. The icon and optional verbose state text adjacent to that icon are collectively known as the Security Chip:

image

The security chip can render in a number of states, depending on the status of the page:

image Secure – Shown for HTTPS pages that were securely delivered using a valid certificate and not compromised by mixed content or other problems.
image Secure, Extended Validation – Shown for Secure pages whose certificate indicates that Extended Validation was performed.
image Neutral – Shown for HTTP pages, as well as Chrome’s built-in pages, like chrome://extensions, as well as pages containing passive mixed content, or delivered using a policy-allowed SHA-1 certificates.
image Not Secure – Shown for HTTP pages that contain a password or credit card input field. Learn more.
image Not Secure (Red) – What Chrome will eventually show for all HTTP pages. You can configure a flag (chrome://flags/#mark-non-secure-as) to Always mark HTTP as actively dangerous today to get this experience early.
image Not Secure, Certificate Error – Shown when a site has a major problem with its certificate (e.g. it’s expired).
image Dangerous – Shown when Google Safe Browsing has identified this page as a source of malware or phishing attacks.

The flyout that appears when you click the security chip is called PageInfo or Website Settings; it shows the security status of the page and the permissions assigned to the origin website:

image

The text atop the pageinfo flyout explains the security state of the page:

image imageimage mixedexpired

Clicking the Learn More link on the flyout for valid HTTPS sites once opened the Chrome Developer Tools’ Security Panel, but now it goes to a Help article. To learn more about the HTTPS state of the page, instead press F12 and select the Security Panel:

image

The View certificate button opens the Windows certificate viewer and displays the current origin’s certificate. Reload the page with the Developer Tools open to see all of the secure origins in the Secure Origins List; selecting any origin allows you to view information about the connection and examine its certificate.

image

The floating grey box at the bottom of the Chrome window that appears when you over over a link is called the status bubble. It is not a security surface and, like IE’s status bar, it is easily spoofed.

image

Navigation to sites with severe security problems is blocked by an interstitial page.

image

(A list of interstitial pages in Chrome can be found at chrome://interstitials/ ).

Clicking on the error code (e.g. ERR_CERT_AUTHORITY_INVALID in the screenshot below) will show more information about the certificate of the blocked site:

image

Clicking the Advanced link shows more information, and in some cases will show an override link that allows you to disregard the security protection and proceed to the site anyway.

image

If a site uses HTTP Strict Transport Security, the Proceed link is hidden and the user has no visible option to proceed.

image

In current versions of Chrome, the user can type a fixed string (sometimes referred to as a Konami code) to bypass HSTS, but this option is deliberately undocumented and slated for removal from Chrome.

If a HTTPS problem is sufficiently bad, the network stack will not connect to the site and will show a network error page instead.

image

-Eric

PS: There exists a developer reference to Chrome Security UI across platforms, but it’s somewhat outdated.

Security UI in Chrome

2016 Brotli Update

Windows 10 Build 14986 adds support for Brotli compression to the Edge browser (but, somewhat surprisingly, not IE11). So at the end of 2016, we now have support for this improved compression algorithm in Chrome, Firefox, Edge, Opera, Brave, Vivaldi, and the long tail of browsers based on Chromium. Of modern browsers, only Apple is a holdout, with a “Radar” feature request logged against Safari but no public announcements.

Unfortunately, behavior across browsers varies at the edges:

  • Edge advertises support for and decodes Brotli compression on both HTTP and HTTPS requests.
  • Chrome advertises Brotli for HTTPS connections but will decode Brotli for both HTTPS and HTTP responses.
  • Firefox advertises Brotli for HTTPS connections and will not decode Brotli responses on HTTP responses.

There’s nothing horribly broken here: sites can safely serve Brotli content to clients that ask for it and those clients will probably decode it. The exception is when the request goes over HTTP… the reason Firefox and Chrome limit their request for Brotli to HTTPS is that, historically, middleboxes (like proxies and gateway filters) have been known to corrupt compression schemes other than gzip and deflate. This proved to be such a big problem in the rollout of SDCH (a now defunct compression algorithm Chrome supported), that the Brotli implementers decided to try to avoid the issue by requiring a secure transport.

-Eric

PS: Major sites, including Facebook and Google, have started deploying Brotli in production– if your site pulls fonts from Google Fonts, you’re already using Brotli today! In unrelated news, the 2016 Performance Calendar includes a post on serving Brotli from CDNs that don’t explicitly support it yet. Another recent post shows how to pair maximal compression for static files with fast compression for dynamically generated responses.

2016 Brotli Update