storytelling, tech

For a Lark

Happy Holidays” David said as he poked his head into my office, handing me an unwrapped holiday card featuring a kitten in a Santa hat. As I took it, I nearly dropped a small white envelope that dropped out from inside. The inscription in the card read simply “Best wishes, David – 2010.”

Uh, thanks, you too!” I replied, both surprised and a bit uncomfortable that my colleague had gotten me a card. We were friendly but not friends. We’d only worked together a few times over the prior year, and it would’ve never occurred to me to get him anything for Christmas. He seemed like an archetypal geek, so we shared an interest in technology and science fiction, but we didn’t hang out or anything like that. Fortunately, he had a stack of cards in his hands, so it wasn’t like I’d been singled out or anything. Hoping he hadn’t gotten me anything fancy, I asked “What’s this?” as I flipped over the rigid envelope. In small print near the flap, it read “Do not open until Christmas 2017.”

His eyes twinkled and he grinned mischievously, an expression I’d never seen from him before. “Just a tiny gift. Well, maybe sort of a test. It’s very important that I give it to you now. But if you open it before 2017, it’ll be the crummiest gift you ever got. If you can wait, maybe it’ll be pretty nice.

I furrowed my brow. “So, like some sort of Savings bond thing?” I asked, thinking back to the bonds I’d gotten from far-off relatives as a little kid… I’d recently stopped dragging them around from apartment to apartment and taken them to the bank to collect the princely sum of $261, two decades after I’d ungratefully wished they were some action figures or a book instead.

David smiled. “Sure, sorta. Don’t lose it. Don’t get it wet. And don’t open it early!

“Uh, I won’t… Thanks?” I promised, and with that, David disappeared into Rob’s office next door to start his spiel all over.

Weird. Well, I’d already bought my direct reports gift cards for the IPic movie theatre and I had one extra left over… I’ll put that in a New Year’s card for David and leave it in his office sometime next week, I resolved. I tossed the card and envelope into the stack of RFC printouts on my bookshelf and went back to the email I was writing, hoping to get everything squared away before leaving for a short Christmas vacation.

The envelope sat on my bookshelf undisturbed, buried in an ever growing pile of paper. I might’ve remembered it the following year, but David had left the company that summer, off to do a volunteer tour with the Peace Corps before joining some startup out in San Francisco. Still buried in a pile of paper when I left the company two years later, the card made its way into an unsorted moving box labeled “office stuff” as we moved across the country to Texas. It then sat quietly in the box in my garage for five more years.

In the summer of 2017, I finally got around to digging through the garage, trashing what I could in an effort to make way for the growing proliferation of tricycles, big wheels, wagons, bikes, and pool toys that our two Texas-born children had collected. I spent a quiet Saturday afternoon in July mired in nostalgia, poring through boxes full of old books and papers and remembering a life before kids and so many responsibilities.

When I eventually uncovered the kitten card in the pile, I snorted and stretched to toss it in the “Recycle” pile before I remembered the weird little envelope. Sure enough, it was still inside, forgotten and untouched for the better part of a decade. I ignored the admonition of the faded green warning and tore it open, long-forgotten curiosity mounting.

The envelope contained a pile of papers folded in thirds. The outermost of these was a cleanly cut sheet of wax paper of the sort that was used for sandwiches back before ziplock bags took over the world. Odd, I mused as I discarded it. The next was a folded sheet of thick white cardstock, taped closed, which bore a short paragraph printed in Christmas-colored ink. It read simply:

Is it Christmas 2017 yet?

If so, happy holidays! Enjoy your present.
If not, please google ‘Marshmallow experiment’ and wait patiently.
You’ve been warned.

I paused as the marshmallow reference tickled something in my memory … some university lecture I’d forgotten long ago? Mildly annoyed, I dragged out my phone and searched as instructed. Oh yeah, that Stanford study about delayed gratification. It’s not like anyone will ever know… I mused as I put down my phone and started to peel back the tape holding the cardstock shut.

The door to the garage opened. “Nap’s over, Nate’s up.” my wife called, and I tossed the letter back in the box, eagerly grabbing the large pile of recycling I’d generated to show her my progress. Dancing around, building lego cars, and wrestling with my two kids, I completely forgot about David’s weird little present for another few months.

In December, off work for a two week winter holiday vacation, I resolved to finish cleaning out the garage. Thirty minutes into the job, I came across the card. I tore open the cardstock. Inside lay a single laser-printed page with a letter printed on one side. The opening paragraph read:

Happy holidays, friend!

I hope you’ve waited patiently for this small gift and aren’t too annoyed at the oddity of its presentation, but it’s all for a purpose.

My accountant has instructed me to make it very clear that this is a GIFT, granted freely without any restrictions, from me to you on December 17th, 2010. It has an approximate market value of $250. Relax– it only cost me about 8 bucks, and it may well be worthless by the time you open it. (Market value is only possibly important for tax purposes) 

Beneath the card was a black and white picture, one inch square, full of smaller squares, the sort of bar code you’ll find on the back of shampoo bottles.

code

The letter continued and I read on, confused but intrigued.

This is a QR code containing the private key for a digital wallet containing bitcoin. Bitcoins are a virtual currency that I’ve been playing with this year, and I thought it would be a lark to give some out as a present. (I’m not clever at picking presents– growing up, I got a new leather wallet every year for Christmas.)

Perhaps in 2017 bitcoins have become worthless (that seems like the most likely outcome), but I have a hunch that perhaps they’ll continue to appreciate over the years. If you’ve been patient, maybe it’s enough to buy you a nicer present now.

If so, happy holidays! If not, I hope my folly brings you at least a chuckle. :)

My heart started to pound in my chest. The page concluded with David’s signature, preceded by 7 hand-scrawled characters.

1000 BTC

Hands shaking with the instant weight of the letter, I dropped it and the universe entered slow motion as the paper fluttered to the ground.

Standard
browsers, life, tech

Google Chrome–One Year In

Four weeks ago, emailed notice of a free massage credit revealed that I’ve been at Google for a year. Time flies when you’re drinking from a firehose.

When I mentioned my anniversary, friends and colleagues from other companies asked what I’ve learned while working on Chrome over the last year. This rambling post is an attempt to answer that question.

Non-maskable Interrupts

While I started at Google just over a year ago, I haven’t actually worked there for a full year yet. My second son (Nate) was born a few weeks early, arriving ten workdays after my first day of work.

I took full advantage of Google’s very generous twelve weeks of paternity leave, taking a few weeks after we brought Nate home, and the balance as spring turned to summer. In a year, we went from having an enormous infant to an enormous toddler who’s taking his first steps and trying to emulate everything his 3 year-old brother (Noah) does.

Baby at the hospitalFirst birthday cake

I mention this because it’s had a huge impact on my work over the last year—much more than I’d naively expected.

When Noah was born, I’d been at Telerik for almost a year, and I’d been hacking on Fiddler alone for nearly a decade. I took a short paternity leave, and my coding hours shifted somewhat (I started writing code late at night between bottle feeds), but otherwise my work wasn’t significantly impacted.

As I pondered joining Google Chrome’s security team, I expected pretty much the same—a bit less sleep, a bit of scheduling awkwardness, but I figured things would fall into a good routine in a few months.

Things turned out somewhat differently.

Perhaps sensing that my life had become too easy, fate decided that 2016 was the year I’d get sick. Constantly. (Our theory is that Noah was bringing home germs from pre-school; he got sick a bunch too, but recovered quickly each time.) I was sick more days in 2016 than I was in the prior decade, including a month-long illness in the spring. That ended with a bout of pneumonia that concluded with a doctor-mandated seven days away from the office. As I coughed my brains out on the sofa at home, I derived some consolation in thinking about Google’s generous life insurance package. But for the most part, my illnesses were minor—enough to keep me awake at night and coughing all day, but otherwise able to work.

Mathematically, you might expect two kids to be twice as much work as one, but in our experience, it hasn’t worked out that way. Instead, it varies between 80% (when the kids happily play together) to 400% (when they’re colliding like atoms in a runaway nuclear reactor). Thanks to my wife’s heroic efforts, we found a workable daytime routine. The nights, however, have been unexpectedly difficult. Big brother Noah is at an age where he usually sleeps through the night, but he’s sure to wake me up every morning at 6:30am sharp. Fortunately, Nate has been a pretty good sleeper, but even now, at just over a year old, he usually still wakes up and requires attention twice a night or so.

I can’t remember the last time I had eight hours of sleep in a row. And that’s been extremely challenging… because I can’t remember much else either. Learning new things when you don’t remember them the next day is a brutal, frustrating process.

When Noah was a baby, I could simply sleep in after a long night. Even if I didn’t get enough sleep, it wouldn’t really matter—I’d been coding in C# on Fiddler for a decade, and deadlines were few and far between. If all else failed, I’d just avoid working on any especially gnarly code and spend the day handling support requests, updating graphics, or doing other simple and straightforward grunt work from my backlog.

Things are much different on Chrome.

Roles

When I first started talking to the Chrome Security team about coming aboard, it was for a role on the Developer Advocacy team. I’d be driving HTTPS adoption across the web and working with big sites to unblock their migrations in any way I could. I’d already been doing the first half of that for fun (delivering talks at conferences like Codemash and Velocity), and I’d previously spent eight years as a Security Program Manager for the Internet Explorer team. I had tons of relevant experience. Easy peasy.

I interviewed for the Developer Advocate role. The hiring committee kicked back my packet and said I should interview as a Technical Program Manager instead.

I interviewed as a Technical Program Manager. The hiring committee kicked back my packet and said I should interview as a Developer Advocate instead.

The Chrome team resolved the deadlock by hiring me as a Senior Software Engineer (SWE).

I was initially very nervous about this, having not written any significant C++ code in over a decade—except for one in-place replacement of IE9’s caching logic which I’d coded as a PM because I couldn’t find a developer to do the work. But eventually I started believing in my own pep talk: “I mean, how hard could it be, right? I’ve been troubleshooting code in web browsers for almost two decades now. I’m not a complete dummy. I’ll ramp up. It’ll be rough, but it’ll work out. Hell, I started writing Fiddler not knowing either C# nor HTTP, and that turned out pretty good. I’ll buy some books and get caught up. There’s no way that Google would have just hired me as a C++ developer without asking me any C++ coding questions if it wasn’t going to all be okay. Right? Right?!?”

The Firehose

I knew I had a lot to learn, and fast, but it took me a while to realize just how much else I didn’t know.

Google’s primary development platform is Linux, an OS that I would install every few years, play with for a day, then forget about. My new laptop was a Mac, a platform I’d used a bit more, but still one for which I was about a twentieth as proficient as I was on Windows. The Chrome Windows team made a half-hearted attempt to get me to join their merry band, but warned me honestly that some of the tooling wasn’t quite as good as it was on Linux and it’d probably be harder for me to get help. So I tried to avoid Windows for the first few months, ordering a puny Windows machine that took around four times longer to build Chrome than my obscenely powerful Linux box (with its 48 logical cores). After a few months, I gave up on trying to avoid Windows and started using it as my primary platform. I was more productive, but incredibly slow builds remained a problem for a few months. Everyone told me to just order another obscenely powerful box to put next to my Linux one, but it felt wrong to have hardware at my desk that collectively cost more than my first car—especially when, at Microsoft, I bought all my own hardware. I eventually mentioned my cost/productivity dilemma to a manager, who noted I was getting paid a Google engineer’s salary and then politely asked me if I was just really terrible at math. I ordered a beastly Windows machine and now my builds scream. (To the extent that any C++ builds can scream, of course. At Telerik, I was horrified when a full build of Fiddler slowed to a full 5 seconds on my puny Windows machine; my typical Chrome build today still takes about 15 minutes.)

Beyond learning different operating systems, I’d never used Google’s apps before (Docs/Sheets/Slides); luckily, I found these easy to pick up, although I still haven’t fully figured out how Google Drive file organization works. Google Docs, in particular, is so good that I’ve pretty much given up on Microsoft Word (which headed downhill after the 2010 version). Google Keep is a low-powered alternative to OneNote (which is, as far as I can tell, banned because it syncs to Microsoft servers) and I haven’t managed to get it to work well for my needs. Google Plus still hasn’t figured out how to support pasting of images via CTRL+V, a baffling limitation for something meant to compete in the space… hell, even Microsoft Yammer supports that, for gods sake. The only real downside to the web apps is that tab/window management on modern browsers is still a very much unsolved problem (but more on that in a bit).

But these speedbumps all pale in comparison to Gmail. Oh, Gmail. As a program manager at Microsoft, pretty much your entire life is in your inbox. After twelve years with Outlook and Exchange, switching to Gmail was a train wreck. “What do you mean, there aren’t folders? How do I mark this message as low priority? Where’s the button to format text with strikethrough? What do you mean, I can’t drag an email to my calendar? What the hell does this Archive thing do? Where’s that message I was just looking at? Hell, where did my Gmail tab even go—it got lost in a pile of sixty other tabs across four top-level Chrome windows. WTH??? How does anyone get anything done?”

Communication and Remote Work

While Telerik had an office in Austin, I didn’t interact with other employees very often, and when I did they were usually in other offices. I thought I had a handle on remote work, but I really didn’t. Working with a remote team on a daily basis is just different.

With communication happening over mail, IRC, Hangouts, bugs, document markup comments, GVC (video conferencing), G+, and discussion lists, it was often hard to figure out which mechanisms to use, let alone which recipients to target. Undocumented pitfalls abounded (many discussion groups were essentially abandoned while others were unexpectedly broad; turning on chat history was deemed a “no-no” for document retention reasons).

It often it took a bit of research to even understand who various communication participants were and how they related to the projects at hand.

After years of email culture at Microsoft, I grew accustomed to a particular style of email, and Google’s is just different. Mail threads were long, with frequent additions of new recipients and many terse remarks. Many times, I’d reply privately to someone on a side thread, with a clarifying question, or suggesting a counterpoint to something they said. The response was often “Hey, this just went to me. Mind adding on the main thread?

I’m working remotely, with peers around the world, so real-time communication with my team is essential. Some Chrome subteams use Hangouts, but the Security team largely uses IRC.

XKCD comic on IRC

Now, I’ve been chatting with people online since BBSes were a thing (I’ve got a five digit ICQ number somewhere), but my knowledge of IRC was limited to the fact that it was a common way of taking over suckers’ machines with buffer overflows in the ‘90s. My new teammates tried to explain how to IRC repeatedly: “Oh, it’s easy, you just get this console IRC client. No, no, you don’t run it on your own workstation, that’d be crazy. You wouldn’t have history! You provision a persistent remote VM on a machine in Google’s cloud, then SSH to that, then you run screens and then you run your IRC client in that. Easy peasy.

Getting onto IRC remained on my “TODO” list for five months before I finally said “F- it”, installed HexChat on my Windows box, disabled automatic sleep, and called it done. It’s worked fairly well.

Google Developer Tooling

When an engineer first joins Google, they start with a week or two of technical training on the Google infrastructure. I’ve worked in software development for nearly two decades, and I’ve never even dreamed of the development environment Google engineers get to use. I felt like Charlie Bucket on his tour of Willa Wonka’s Chocolate Factory—astonished by the amazing and unbelievable goodies available at any turn. The computing infrastructure was something out of Star Trek, the development tools were slick and amazing, the process was jaw-dropping.

While I was doing a “hello world” coding exercise in Google’s environment, a former colleague from the IE team pinged me on Hangouts chat, probably because he’d seen my tweets about feeling like an imposter as a SWE.  He sent me a link to click, which I did. Code from Google’s core advertising engine appeared in my browser. Google’s engineers have access to nearly all of the code across the whole company. This alone was astonishing—in contrast, I’d initially joined the IE team so I could get access to the networking code to figure out why the Office Online team’s website wasn’t working. “Neat, I can see everything!” I typed back. “Push the Analyze button” he instructed. I did, and some sort of automated analyzer emitted a report identifying a few dozen performance bugs in the code. “Wow, that’s amazing!” I gushed. “Now, push the Fix button” he instructed. “Uh, this isn’t some sort of security red team exercise, right?” I asked. He assured me that it wasn’t. I pushed the button. The code changed to fix some unnecessary object copies. “Amazing!” I effused. “Click Submit” he instructed. I did, and watched as the system compiled the code in the cloud, determined which tests to run, and ran them. Later that afternoon, an owner of the code in the affected folder typed LGTM (Googlers approve changes by typing the acronym for Looks Good To Me) on the change list I had submitted, and my change was live in production later that day. I was, in a word, gobsmacked. That night, I searched the entire codebase for misuse of an IE cache control token and proposed fixes for the instances I found. I also narcissistically searched for my own name and found a bunch of references to blog posts I’d written about assorted web development topics.

Unfortunately for Chrome Engineers, the introduction to Google’s infrastructure is followed by a major letdown—because Chromium is open-source, the Chrome team itself doesn’t get to take advantage of most of Google’s internal goodies. Development of Chrome instead resembles C++ development at most major companies, albeit with an automatically deployed toolchain and enhancements like a web-based code review tool and some super-useful scripts. The most amazing of these is called bisect-builds, and it allows a developer to very quickly discover what build of Chrome introduced a particular bug. You just give it a “known good” build number and a “known bad” build number and it  automatically downloads and runs the minimal number of builds to perform a binary search for the build that introduced a given bug:

Console showing bisect builds running

Firefox has a similar system, but I’d’ve killed for something like this back when I was reproducing and reducing bugs in IE. While it’s easy to understand how the system functions, it works so well that it feels like magic. Other useful scripts include the presubmit checks that run on each change list before you submit them for code review—they find and flag various style violations and other problems.

Compilation itself typically uses a local compiler; on Windows, we use the MSVC command line compiler from Visual Studio 2015 Update 3, although work is underway to switch over to Clang. Compilation and linking all of Chrome takes quite some time, although on my new beastly dev boxes it’s not too bad. Googlers do have one special perk—we can use Goma (a distributed compiler system that runs on Google’s amazing internal cloud) but I haven’t taken advantage of that so far.

For bug tracking, Chrome recently moved to Monorail, a straightforward web-based bug tracking system. It works fairly well, although it is somewhat more cumbersome than it needs to be and would be much improved with a few tweaks. Monorail is open-source, but I haven’t committed to it myself yet.

For code review, Chrome presently uses Rietveld, a web-based system, but this is slated to change in the near(ish) future. Like Monorail, it’s pretty straightforward although it would benefit from some minor usability tweaks; I committed one trivial change myself, but the pending migration to a different system means that it isn’t likely to see further improvements.

As an open-source project, Chromium has quite a bit of public documentation for developers, including Design Documents. Unfortunately, Chrome moves so fast that many of the design documents are out-of-date, and it’s not always obvious what’s current and what was replaced long ago. The team does value engineers’ investment in the documents, however, and various efforts are underway to update the documents and reduce Chrome’s overall architectural complexity. I expect these will be ongoing battles forever, just like in any significant active project.

What I’ve Done

“That’s all well and good,” my reader asks, “but what have you done in the last year?”

I Wrote Some Code

My first check in to Chrome landed in February; it was a simple adjustment to limit Public-Key-Pins to 60 days. Assorted other checkins trickled in through the spring before I went on paternity leave. The most fun fix I did cleaned up a tiny UX glitch that sat unnoticed in Chrome for almost a decade; it was mostly interesting because it was a minor thing that I’d tripped over for years, including back in IE. (The root cause was arguably that MSDN documentation about DWM lied; I fixed the bug in Chrome, sent the fix to IE, and asked MSDN to fix their docs).

I fixed a number of minor security bugs, and lately I’ve been working on UX issues related to Chrome’s HTTPS user-experience. Back in 2005, I wrote a blog post complaining about websites using HTTPS incorrectly, and now, just over a decade later, Chrome and Firefox are launching UI changes to warn users when a site is collecting sensitive information on pages which are Not Secure; I’m delighted to have a small part in those changes.

Having written a handful of Internet Explorer Extensions in the past, I was excited to discover the joy of writing Chrome extensions. Chrome extensions are fun, simple, and powerful, and there’s none of the complexity and crashes of COM.

My 3 Chrome Extensions

My first and most significant extension is the moarTLS Analyzer– it’s related to my HTTPS work at Google and it’s proven very useful in discovering sites that could improve their security. I blogged about it and the process of developing it last year.

Because I run several different Chrome instances on my PC (and they update daily or weekly), I found myself constantly needing to look up the Chrome version number for bug reports and the like. I wrote a tiny extension that shows the version number in a button on the toolbar (so it’s captured in screenshots too!):

Show Chrome Version screenshot

More than once, I spent an hour or so trying to reproduce and reduce a bug that had been filed against Chrome. When I found out the cause, I’d jubilently add my notes to the issue in the Monorail bug tracker, click “Save changes” and discover that someone more familiar with the space had beaten me to the punch and figured it out while I’d had the bug open on my screen. Adding an “Issue has been updated” alert to the bug tracker itself seemed like the right way to go, but it would require some changes that I wasn’t able to commit on my own. So, instead I built an extension that provides such alerts within the page until the feature can be added to the tracker itself.

Each of these extensions was a joy to write.

I Filed Some Bugs

I’m a diligent self-hoster, and I run Chrome Canary builds on all of my devices. I submit crash reports and file bugs with as much information as I can. My proudest moment was in helping narrow down a bizarre and intermittent problem users had with Chrome on Windows 10, where Chrome tabs would crash on every startup until you rebooted the OS. My blog post explains the full story, and encourages others to file bugs as they encounter them.

I Triaged More Bugs

I’ve been developing software for Windows for just over two decades, and inevitably I’ve learned quite a bit about it, including the undocumented bits. That’s given me a leg up in understanding bugs in the Windows code. Some of the most fun include issues in Drag and Drop, like this gem of a bug that means that you can’t drop files from Chrome to most applications in Windows. More meaningful bugs relate to problems with Windows’ Mark-of-the-Web security feature (about which I’ve blogged about several times).

I Took Sheriff Rotations

Google teams have the notion of sheriffs—a rotating assignment that ensures that important tasks (like triaging incoming security bugs) always has a defined owner, without overwhelming any single person. Each Sheriff has a term of ~1 week where they take on additional duties beyond their day-to-day coding, designing, testing, etc.

The Sheriff system has some real benefits—perhaps the most important of which is creating a broad swath of people experienced and qualified in making triage decisions around security vulnerabilities. The alternative is to leave such tasks to a single owner, rapidly increasing their bus factor and thus the risk to the project. (I know this from first-hand experience. After IE8 shipped, I was on my way out the door to join another team. Then IE’s Security PM left, leaving a gaping hole that I felt obliged to stay around to fill. It worked out okay for me and the team, but it was tense all around.)

I’m on two sheriff rotations: Enamel (my subteam) and the broader Chrome Security Sheriff.

The Enamel rotation’s tasks are akin to what I used to do as a Program Manager at Microsoft—triage incoming bugs, respond to questions in the Help Forums, and generally act as a point of contact for my immediate team.

In contrast, the Security Sheriff rotation is more work, and somewhat more exciting. The Security Sheriff’s duties include triaging all bugs of type “Security”, assigning priority, severity, and finding an owner for each. Most security bugs are automatically reported by our fuzzers (a tireless robot army!), but we also get reports from the public and from Chrome team members and Project Zero too.

At Microsoft, incoming security bug reports were first received and evaluated by the Microsoft Security Response Center (MSRC); valid reports were passed along to the IE team after some level of analysis and reproduction was undertaken. In general, all communication was done through MSRC, and the turnaround cycle on bugs was typically on the order of weeks to months.

In contrast, anyone can file a security bug against Chrome, and every week lots of people do. One reason for that is that Chrome has a Vulnerability Rewards program which pays out up to $100K for reports of vulnerabilities in Chrome and Chrome OS. Chrome paid out just under $1M USD in bounties last year. This is an awesome incentive for researchers to responsibly disclose bugs directly to us, and the bounties are much higher than those of nearly any other project.

In his “Hacker Quantified Security” talk at the O’Reilly Security conference, HackerOne CTO and Cofounder Alex Rice showed the following chart of bounty payout size for vulnerabilities when explaining why he was using a Chromebook. Apologies for the blurry photo, but the line at the top shows Chrome OS, with the 90th percentile line miles below as severity rises to Critical:

Vulnerability rewards by percentile. Chrome is WAY off the chart.

With a top bounty of $100000 for an exploit or exploit chain that fully compromises a Chromebook, researchers are much more likely to send their bugs to us than to try to find a buyer on the black market.

Bug bounties are great, except when they’re not. Unfortunately, many filers don’t bother to read the Chrome Security FAQ which explains what constitutes a security vulnerability and the great many things that do not. Nearly every week, we have at least one person (and often more) file a bug noting “I can use the Developer Tools to read my own password out of a webpage. Can I have a bounty?” or “If I install malware on my PC, I can see what happens inside Chrome” or variations of these.

Because we take security bug reports very seriously, we often spend a lot of time on what seem like garbage filings to verify that there’s not just some sort of communication problem. This exposes one downside of the sheriff process—the lack of continuity from week to week.

In the fall, we had one bug reporter file a new issue every week that was just a collection of security related terms (XSS! CSRF! UAF! EoP! Dangling Pointer! Script Injection!) lightly wrapped in prose, including screenshots, snippets from websites, console output from developer tools, and the like. Each week, the sheriff would investigate, ask for more information, and engage in a fruitless back and forth with the filer trying to figure out what claim was being made. Eventually I caught on to what was happening and started monitoring the sheriff’s queue, triaging the new findings directly and sparing the sheriff of the week. But even today we still catch folks who lookup old bug reports (usually Won’t Fixed issues), copy/paste the content into new bugs, and file them into the queue. It’s frustrating, but coming from a closed bug database, I’d choose the openness of the Chrome bug database every time.

Getting ready for my first Sherriff rotation, I started watching the incoming queue a few months earlier and felt ready for my first rotation in September. Day One was quiet, with a few small issues found by fuzzers and one or two junk reports from the public which I triaged away with pointers to the “Why isn’t a vulnerability” entries in the Security FAQ. I spent the rest of the day writing a fix for a lower-priority security bug that had been filed a month before. A pretty successful day, I thought.

Day Two was more interesting. Scanning the queue, I saw a few more fuzzer issues and one external report whose text started with “Here is a Chrome OS exploit chain.” The report was about two pages long, and had a forty-two page PDF attachment explaining the four exploits the finder had used to take over a fully-patched Chromebook.

Star Wars trench run photo

Watching Luke’s X-wing take out the Death Star in Star Wars was no more exciting than reading the PDF’s tale of how a single byte memory overwrite in the DNS resolver code could weave its way through the many-layered security features of the Chromebook and achieve a full compromise. It was like the most amazing magic trick you’ve ever seen.

I hopped over to IRC. “So, do we see full compromises of Chrome OS every week?” I asked innocently.

“No. Why?” came the reply from several corners. I pasted in the bug link and a few moments later the replies started flowing in “OMG. Amazing!” Even guys from Project Zero were impressed, and they’re magicians who build exploits like this (usually for other products) all the time. The researcher had found one small bug and a variety of neglected components that were thought to be unreachable and put together a deadly chain.

The first patches were out for code review that evening, and by the next day, we’d reached out to the open-source owner of the DNS component with the 1-byte overwrite bug so he could release patches for the other projects using his code. Within a few days, fixes to other components landed and had been ported to all of the supported versions of Chrome OS. Two weeks later, the Chrome Vulnerability rewards team added the reward-100000 tag, the only bug so far to be so marked. Four weeks after that, I had to hold my tongue when Alex mentioned that “no one’s ever claimed that $100000 bounty” during his “Hacker Quantified Security” talk. Just under 90 days from filing, the bug was unrestricted and made available for public viewing.

The remainder of my first Sheriff rotation was considerably less exciting, although still interesting. I spent some time looking through the components the researcher had abused in his exploit chain and filed a few bugs. Ultimately, the most risky component he used was removed entirely.

Outreach and Blogging

Beyond working on the Enamel team (focused on Chrome’s security UI surface), I also work on the “MoarTLS” project, designed to help encourage and assist the web as a whole in moving to HTTPS. This takes a number of forms—I help maintain the HTTPS on Top Sites Report Card, I do consultations and HTTPS Audits with major sites as they enable HTTPS on their sites. I discover, reduce, and file bugs on Chrome’s and other browsers’ support of features like Upgrade-Insecure-Requests. I publish a running list of articles on why and how sites should enable TLS. I hassle teams all over Google (and the web in general) to enable HTTPS on every single hyperlink they emit. I responsibly disclosed security bugs in a number of products and sites, including a vulnerability in Hillary Clinton’s fundraising emails. I worked to send a notification to many many many thousands of sites collecting user information non-securely, warning them of the UI changes in Chrome 56.

When I applied to Google for the Developer Advocate role, I expected I’d be delivering public talks constantly, but as a SWE I’ve only given a few talks, including my  Migrating to HTTPS talk at the first O’Reilly Security Conference. I had a lot of fun at that conference, catching up with old friends from the security community (mostly ex-Microsofties). I also went to my first Chrome Dev Summit, where I didn’t have a public talk (my colleagues did) but I did get to talk to some major companies about deploying HTTPS.

I also blogged quite a bit. At Microsoft, I started blogging because I got tired of repeating myself, and because our Exchange server and document retention policies had started making it hard or impossible to find old responses—I figured “Well, if I publish everything on the web, Google will find it, and Internet Archive will back it up.”

I’ve kept blogging since leaving Microsoft, and I’m happy that I have even though my reader count numbers are much lower than they were at Microsoft. I’ve managed to mostly avoid trouble, although my posts are not entirely uncontroversial. At Microsoft, they wouldn’t let me publish this post (because it was too frank); in my first month at Google, I got a phone call at home (during the first portion of my paternity leave) from a Google Director complaining that I’d written something that was too harsh about a change Microsoft had made. But for the most part, my blogging seems not to ruffle too many feathers.

Tidbits

  • Food at Google is generally really good; I’m at a satellite office in Austin, so the selection is much smaller than on the main campuses, but the rotating menu is fairly broad and always has at least three major options. And the breakfasts! I gained about 15 pounds in my first few months, but my pneumonia took it off and I’ve restrained my intake since I came back.
  • At Microsoft, I always sneered at companies offering free food (“I’m an adult professional. I can pay for my lunch.”), but it’s definitely convenient to not have to hassle with payments. And until the government closes the loophole, it’s a way to increase employees’ compensation without getting taxed.
  • For the first three months, I was impressed and slightly annoyed that all of the snack options in Google’s micro-kitchens are healthy (e.g. fruit)—probably a good thing, since I sit about twenty feet from one. Then I saw someone open a drawer and pull out some M&Ms, and I learned the secret—all of the junk food is in drawers. The selection is impressive and ranges from the popular to the high end.
  • Google makes heavy use of the “open-office concept.” I think this makes sense for some teams, but it’s not at all awesome for me. I’d gladly take a 10% salary cut for a private office. I doubt I’m alone.
  • Coworkers at Google range from very smart to insanely off-the-scales-smart. Yet, almost all of them are humble, approachable, and kind.
  • Google, like Microsoft, offers gift matching for charities. This is an awesome perk, and one I aim to max out every year. I’m awed by people who go far beyond that.
  • Window Management – I mentioned earlier that one downside of web-based tools is that it’s hard to even find the right tab when I’ve got dozens of open tabs that I’m flipping between. The Quick Tabs extension is one great mitigation; it shows your tabs in a searchable, most-recently-used list in a convenient dropdown:

QuickTabs Extension

Another trick that I learned just this month is that you can instruct Chrome to open a site in “App” mode, where it runs in its own top-level window (with no other tabs), showing the site’s icon as the icon in the Windows taskbar. It’s easy:

On Windows, run chrome.exe –app=https://mail.google.com

While on OS X, run open -n -b com.google.Chrome –args –app=’https://news.google.com

Tip: The easy way to create a shortcut to a the current page in app mode is to click the Chrome Menu > More Tools > Add to {shelf/desktop} and tick the Open as Window checkbox.

I now have SlickRun MagicWords set up for mail, calendar, and my other critical applications.


So, that’s it for year one @Google. I’m feeling lucky as I head into year two!

-Eric

Standard
privacy, security, tech

HTTPS Only Works If You Use It – Tipster Edition

Convoy with three armored tanks and one pickup truck

It’s recently become fashionable for news organizations to build “anonymous tip” sites that permit members of the public to confidentially submit tips about stories of public interest.

Unfortunately, would-be tipsters need to take great care when exploring such options, because many organizations aren’t using HTTPS properly to ensure that the user’s traffic to the news site is protected from snoopers on the network.

If the organization uses any non-secure redirections in loading its “Tips” page, or the page pulls any unique images or other content over a non-secure connection, the fact that you’ve visited the “Tips” page will be plainly visible to your ISP, employer, fellow coffee shop patron, home-router-pwning group, etc.

NYTimes call for Tips, showing non-secure redirects

The New Yorker Magazine call for Tips, showing non-secure redirects

Here are a few best practices for organizations that either a) anonymous tips online or b) use webpages to tell would-be leakers how to send anonymous tips via Tor or non-electronic means:

For end users:

  • Consider using Tor or other privacy-aiding software.
  • Don’t use a work PC or any PC that may have spyware or non-public certificate roots installed.

Stay private out there!

-Eric

Standard
design, privacy, tech

Do Not Lie to Users

Multiple people working on Outlook.com thought this was a reasonable design.

After a user deletes an email, then manually goes into the Deleted Items folder, then clicks Delete again, then acknowledges that they wish to Permanently Delete the deleted item:

Delete

… the item is still not deleted. You can “Recover deleted items” from your Deleted items folder:

Recover

… and voila, they’re all hiding out there:

Purge

Further, if you click the Purge button, you’ll find that it doesn’t actually do anything.

The poor user is expected to:

  1. Be aware of this insane behavior
  2. Individually check a box next to each unwanted message, then click Purge.

Microsoft’s design is offensively anti-privacy.

-Eric

PS: This sums it up pretty well.

image

Standard
tech

Troubleshooting Windows 10 Bluescreens

I recently bought a Dell XPS 8900 desktop system with Windows 10. It ran okay for a while, but after enabling Hyper-V, every few minutes the system would freeze for a few seconds and then reboot with no explanation. Looking at the Event Viewer’s Windows Logs > System revealed that the system had bugchecked (blue screened):

Event Viewer - BugcheckBugcheck 0x1a indicates a problem with “Memory Management” .

Run WinDbg as Administrator. File > Open Crash Dump:

WinDBG open crash dump 

Open C:\Windows\memory.dmp. Wait for symbols to download:

Debuggee not connected; symbols downloading

If symbols aren’t downloaded automatically, try typing .symfix and then .reload in the command prompt at the bottom.

Use !analyze -v says WinDBG

Then, follow the tool’s advice and run !analyze -v to have the debugger analyze the crash. WinDBG presents a surprisingly readable explanation:

WinDBG notes driver memory corruption

So a driver’s at fault, but which one?

Stack trace points at WiFi

It looks like bcmwl63a, for which symbols aren’t loaded, one clue that this isn’t Microsoft’s code. Let’s find out more about it using lm vm bcmw163a:

 

Debugger points at Wifi driver

Pop over to the listed path to examine the file’s properties, and see that it’s the WiFi driver:

Driver details

The Dell 1560 802.11ac card is the same type as found in my Dell XPS 13” notebook PC, where it was responsible for a flurry of bluescreens last year. The driver appears to have improved (the XPS 13 doesn’t crash anymore), but it looks like some corner cases got missed, likely related to the Hyper-V virtual networking code. Rather than waiting for an updated driver, the experts on Twitter suggested I simply upgrade to the Intel 7265 and install the latest Intel PROSet wireless driver. At $20 on Amazon, this seemed like a fine approach.

The upgrade was straightforward and would’ve taken less than 5 minutes to install except one of the nearly microscopic sockets broke off as I removed the Dell card’s antenna cables:

BrokenSocket

I used a needle to remove the broken pieces from the antenna’s connector before it would fit on the new card’s socket. After connecting the antenna, the new card easily slid into the slot and Windows recognized it on next boot. I used Device Manager to ensure the drivers loaded for the new card’s Bluetooth support, and installed the latest PROSet driver. Everything’s been working great since.

While WinDBG is one of the more inscrutable tools I use, it worked great in this situation and would point even a novice in the right direction.

 

-Eric

Standard
browsers, tech

File the Bug

Two experiences this week reminded me of a very important principle for improving the quality of software… if you see something, say something. And the best way to do that is to file a bug.

Something Weird? File a bug!

The first case was last Thursday, when a user filed a bug in Chrome’s tracker noting that Chrome’s window border icons often got “stuck” in a hover state after being moused over. It was a clear, simple bug report and it was easily reproduced. I’ve probably hit this a hundred times over the years and didn’t think much of it… “probably some weird thing in my graphics card or some utility I’m running.” It never occurred to me that everybody else might be seeing this, or that it was exhibited only by Chrome.

Fortunately, the bug report showed that this issue was something others were hitting too, so I took a look. The problem proved to be almost unique to Chrome (not occurring in other Windows applications), and has existed for at least seven years, reproducing on every version from Windows Vista to Windows 10.

A scan of the bug tracker suggests that Thursday’s report was the first time in those seven years that this bug was filed; less than a week later, the simple fix is checked in and on the way to Chrome 54. Obviously, this is only a minor cosmetic issue, but we want our browser looking good at all times!

Animation of the fixed bug

Another cool aspect of this fix is that it will fix other applications too… the Opera and Vivaldi browsers are based on Chromium open-source roots and inherited this problem; they’ll probably pick up this fix shortly too.

th;df – Say Something Anyway

Even if you don’t file a bug, you should still say something. Recently, Ana Tudor noted on Twitter that her system was in a state after restart where neither Chrome nor Brave could render web content; both browsers showed the “crashed tab” experience, even after restarting and reinstalling the browsers. Running with the no-sandbox flag worked, and rebooting the system fully solved the problem. Her report sounded suspiciously similar to a problem I’d encountered back in April; fortunately, I’d filed a bug.

At the time, that bug was deemed unreproducible and I’d dismissed it as some wonkiness on my specific system, but Ana’s complaint brought this back to my attention. She’d also added another piece of data I didn’t have in my original report—the problem also occurred in Brave, but not Firefox or IE.

Even more fortunately, I hit this problem again after a system reboot yesterday, and because of Ana’s report, I was no longer convinced that this bug was some weird quirk on just my system. Playing with the repro, I found that neither Opera nor Vivaldi reproduced the problem; both of those browsers are architecturally similar to Brave, but importantly, both are 32-bit. So this was a great clue that the problem was specific to 64-bit. And I confirmed this, finding that the bug repro’d only in 64-bit Chrome Canary but not in 32-bit Canary. Now we’re cooking with gas!

I built Chromium and ran it through WinDBG, seeing that when the sandboxed content renderer process was starting up, it was hitting three debug breakpoints before dying. The breakpoints were in sandbox::InterceptionAgent::OnDllLoad, a function Chrome uses to thunk certain Windows APIs to inject security filters. At this point, and with a reliable repro in hand, my smarter colleague took over and quickly found that the code to allocate memory for the thunk was failing, due to some logic bugs. Thunks must be located at a particular place in memory – within 2gb of the thunked function – and the code to place our thunks was failing when ASLR randomly loaded the kernel32, gdi32, and user32 DLLs at the very top of the address space, leaving no room for our thunks. When the allocation failed, Chrome refused to allow the DLL to be loaded into the sandbox, and the renderer necessarily died. After the user rebooted the system again, ASLR again moved the DLLs to some other location and (usually) this location gives us room to place our thunks. With 20/20 hindsight, the root cause of this bug (and the upcoming fix) are obvious.

But we only knew to look for the problem because Ana took the time to say something.

Final Thoughts

  • Browser telemetry is great—we catch crashes and all sorts of problems with it. But debugging via telemetry can be really challenging— more akin to solving a mystery than following a checklist. For instance, in the case of the sandbox bug, the fact that the problem reproduced in Brave was a huge clue, and not something we’d ever know from telemetry.
  • Well-run projects love bug reports. Back when I was building Fiddler, a lot of users I talked to said things like: “Well, it’s free and pretty good so I didn’t want to bother you with a complaint about some bug.” This is exactly backwards. For most of Fiddler’s lifetime, bug reports from the community were the only compensation I received from making the tool available to everyone for free. Getting bug reports meant I could improve the product without having to pay for test machines and devices, hire a test organization, etc, etc. When I eventually sold Fiddler to Telerik, a large part of the value they were buying was the knowledge that the tool had been battle-tested by millions of users over 9 years and that I’d fixed thousands of bugs from that community.
  • Filing bugs is generally easy, and it’s especially easy for Chrome.
    • First, simply search for an existing bug on crbug.com
    • If you find it’s a known issue, star it so you get updates
    • If it’s not a known issue, click the New Issue button at the top-left
    • Tell us as much as you can about the problem. Try to put yourself in the reader’s shoes—we don’t know much about your system or scenario, so the more details you can provide, the better.
  • Screenshots and URLs that reproduce problems are invaluable.
  • Find a bug in another browser? Report it!

 

Thanks for your help in improving our digital world!

 

-Eric

Standard
tech

Repairing Corrupt ZIP Files

Fiddler’s default file format is the SAZ Format, which is just a ZIP file with a particular structure. Unfortunately, sometimes users’ SAZ files get corrupted due to failing disks or incomplete downloads, and when this happens, Fiddler can no longer open them.

Corrupt Archive dialog

Because Fiddler uses a standard ZIP file, surely a good ZIP reader will be able to read some data, right?

Windows Explorer’s primitive ZIP implementation can’t do anything useful:

Windows Cannot Open dialog

Alas, not even 7-zip offers any help.

Cannot Open dialog

Okay, well, surely you can just use any of the many ZIP Repair tools to extract the data that isn’t corrupt from the file, right?

Alas, a few hour’s worth of research suggests that almost all of the public ZIP repair tools are terrible, unable to handle most forms of corruption. Some claim to work, but the resulting “repaired” archive remains unreadable:

Error 0x800040005 Unspecified Error when extracting

Those tools that seem promising aren’t free, and require spending $30 or so before you can even determine whether they’ll get your data back.

What to do?

Write my own, of course. Most SAZ files are internally quite simple, and it shouldn’t be too hard to recover most data from archives that aren’t encrypted.

Fiddler 4.6.2 will offer a Repair Corrupt option on the dropdown in the Load dialog box:

Repair Corrupt option

When you choose this option, Fiddler will enter its archive recovery mode:

Explanation of recovery process

Notably, the recovery mode doesn’t especially care whether the recovered ZIP file is a SAZ file. If not, Fiddler will alert you that the file couldn’t be interpreted as a SAZ:

Fiddler Alert - Not A SAZ

… but the repaired file on your desktop:

image

… should now be openable by your ZIP reader of choice:

Windows Explorer View

I hope you find this new capability useful, both for Fiddler-generated files as well as any other corrupt ZIP or ZIP-based (e.g. docx, pptx) files you may encounter.

-Eric Lawrence

Standard