“Magic” is great… except when it isn’t.
Software Design is largely about tradeoffs, and one of the more interesting tradeoffs is between user experience and predictability. This has come up repeatedly throughout my career and in two independent contexts yesterday that I’ll describe in this post.
I’m working on a tiny UX change to Google Chrome to deemphasize the data component of data: URIs.
Chrome is a multi-platform browser that runs on Windows, Mac, Linux, ChromeOS, Android, and iOS, which means that I need to make the same change in a number of places. Four, to be precise: Views (our cross-platform UI that runs on Windows, Linux and ChromeOS), Cocoa (Mac), Bling (iOS) and Clank (Android). The change for Views was straightforward and I did it first; porting the change to Mac wasn’t too hard. With the Mac change in hand, I figured that the iOS change would be simple, as both are written in Objective C++. I don’t have a local Mac development box, so I have to upload my iOS changes to the Chromium build bots to see if they work. Unfortunately, my iOS build failed, with a complaint from the linker:
Undefined symbols for architecture arm7:
“gfx::Range::ToNSRange() const”, referenced from:
OmniboxViewIOS::SetEmphasis(bool, gfx::Range) in omnibox_view_ios.o
OmniboxViewIOS::UpdateSchemeEmphasis(gfx::Range) in omnibox_view_ios.o
ld: symbol(s) not found for architecture arm7
Hrm… that’s weird; the Mac build worked and the iOS build used the same APIs. Let’s go have a look at the definition of ToNSRange():
Oh, weird. It’s in an OS_MACOSX block, so I guess it’s not there for iOS. But how did it compile and only fail at linking?
Turns out that the first bit of “magic” is that when OS_IOS is defined, OS_MACOSX is always also defined:
I was relieved to learn that I’m not the only person who didn’t know this, both by asking around and by finding code blocks like this:
Okay, so that’s why it compiled, but why didn’t it link? Let’s look at the build configuration file:
Hmmm… That’s a bit suspicious. There’s range_mac.mm and range_win.cc both listed within a single target. But it seems unlikely that the Mac build includes the Windows code, or that the Windows build includes the Mac code. Which suggests that maybe there’s some magic not shown in the build configuration that determines what actually gets built. And indeed, it turns out that such magic does exist.
The overall Build Configuration introduces its own incompatible magic, whereby filenames suffixed with _mac only compile on Mac… and that is limited to actual Mac, not including iOS:
This meant that the iOS compilation had a header file with no matching implementation, and I was the first lucky guy to stumble upon this by calling the missing code.
Magic handling of filenames is simultaneously great (“So convenient”) and awful—I spoke to a number of engineers who knew that the build does this, but had no idea how, or whether or not iOS builds would include _mac-suffixed files. My instinct for fixing this would be to rename range_mac.mm to just range_apple.mm (because .mm files are compiled only for Mac and iOS), but instead I’ve been told that the right fix is to just temporarily disable the magic:
Talking to some of the experts, I learned that the long term goal is to get rid of the sources_assignment_filters altogether (No more magic!) but doing so entails a bunch of boring work (No more magic!).
Magic is great, when it works.
When it doesn’t, I spend a lot of time investigating and writing blog posts. In this case, I ended up flailing about for a few hours (because sending my various fix attempts off to the bots isn’t fast) trying to figure out what was going on.
There’s plenty of other magic that happens throughout the Chromium developer toolchain; some of it visible and some of it invisible. For instance, consider what happens when I forget the name of the command that finds out what release a changelist went into:
Git “magically” knows what I meant, and points out my mistake.
Elsewhere, however, Chromium’s git “magically” knows what I meant and just does it:
Which approach is better? I suppose it depends. The code that suggests proper commands is irritating (“Dammit, if you knew what I meant, you could just do it!”) but it’s also predictable—only legal commands run and typos cannot go overlooked and propagate throughout scripts, documentation, etc.
This same type of tradeoff appeared in a different scenario by the end of the day.
This repro won’t work forever, but try clicking this link: https://www.kubernetes.io. If you do this right now, you’ll find that the page works great in Chrome, but doesn’t work in IE or Edge:
If you pop the site into SSLLabs’ server test, you can see that the server indeed has a problem:
The certificate’s SubjectAltNames field contains
kubernetes.io, but not
So, what gives? Why does the original www URL work in Chrome? If you open the Developer Tools console while following the link, you’ll see the following explanation of the magic:
Basically, Chrome saw that the certificate for http://www.kubernetes.io was misconfigured and recognized that sending the user to the bare domain kubernetes.io was probably the right thing to do. So, it just did that, which is great for the user. Right? Right??
Well, yes, it’s great for Chrome users, and maybe for HTTPS adoption– users don’t like certificate errors, and asking them to manually “fix” things the browser can fix itself is annoying.
But it’s less awesome for users of other browsers without this accommodation, especially when the site developers don’t know about Chrome’s magic behavior and close the bug as “fixed” because they tested in Chrome. So other browsers have to adopt this magic if they want to be as great as Chrome (no browser vendor likes bugs whining “Your browser doesn’t work but Chrome does!”). Then, after all the browsers have the magic in place, then other tools like curl and wfetch and wget etc need to adopt it. And the magic is now a hack that lives on for decades, increasing the development cost of all future web clients. Blargh.
Update: http://www.twitter.com in Brazil had this problem in February 2020, but the magic didn’t take effect, because the feature SSLCommonNameMismatchHandling is disabled for sites that opted into HSTS. Similarly, the magic doesn’t take effect if the certificate does not contain the exact hostname (without wildcards) after removing the
It’s worth noting that this scenario was especially confusing for users of Microsoft Edge Legacy, because its address box has special magic that hides the
www. prefix, even on error pages. The default address is a “lie”:
Only by putting focus in the address bar can you see the “truth”:
When you’re building magic into your software, consider carefully how you do so.
- Do you make your magic invisible, or obvious?
- Is there a way to scope it, or will you have to maintain it forever?
- Are you training users to expect magic, or guiding them away from it?
- If you’re part of an ecosystem, is your magic in line with your long-term ecosystem goals?