Browser Basics: User Gestures

The Web Platform offers a great deal of power, and unfortunately evil websites go to great lengths to abuse it. One of the weakest (but simplest to implement) protections against such abuse is to block actions that were not preceded by a “User Gesture.” Such gestures (sometimes more precisely called User Activations) include a variety of simple actions, from clicking the mouse to typing a key; each interpreted as “The user tried to do something in this web content.”

A single user gesture can unlock any of a surprisingly wide array of privileged (“gated”) actions:

Allow a popup window to open
Allow a picture-in-picture to open
Set focus to a window
Allow an Application Protocol to be invoked
Allow an OnBeforeUnload prompt to show
Allow the Vibration API to vibrate the device
Allow script to take the window fullscreen
Allow the password manager to fill the username/password into the page in a way that JavaScript can read them
Allow the page to prompt the user for a file to upload
Write data to the clipboard
Impact the behavior of file downloads (e.g. prompting)
…and many more…

Abuse by Attackers

When you see a site show a UI like this:

…chances are good that what they’re really trying to do is trick you into performing a gesture (mouse click) so they can perform a privileged action– in this case, open a popup ad in a new tab. In the worst case, an attacker might use your innocuous-seeming gesture (e.g. “Please hold down the Enter key“) to not only spawn a popup, but to cause you to take action (e.g. “Confirm that transaction”) on the victim website loaded into the popup, a gesture jacking or gesture laundering attack. Historically, browser UI itself was also vulnerable to these sorts of abuses.

In terms of which actions can cause a gesture, the list is surprisingly limited, and includes keystrokes and mousedown (but not mouseup/click):

// Returns |true| if |type| is the kind of user input that should trigger user interaction observers.
bool IsUserInteractionInputType(blink::WebInputEvent::Typetype) {
// Ideally, this list would be based more off of
// https://whatwg.org/C/interaction.html#triggered-by-user-activation. return type ==
blink::WebInputEvent::Type::kMouseDown ||
type == blink::WebInputEvent::Type::kGestureScrollBegin ||
type == blink::WebInputEvent::Type::kTouchStart ||
type == blink::WebInputEvent::Type::kRawKeyDown; }

Some gestures are considered “consumable”, meaning that a single user action allows only one privileged action; subsequent privileged actions require another gesture. Web Developers do not have unlimited time to consume the action: In Chrome, when you click in a web page, the browser considers this “User Activation” valid for five seconds (as of February 2019, last verified Nov ’23) before it expires. Here’s a simple Time-Delayed Open() test.

Unfortunately, even this weak protection is subject to both false positives (an unwanted granting of privilege) and false negatives (an action is unexpectedly blocked).

You can learn more about this topic (and the complexity of dealing with nested frames, etc) in the original Chromium User Activation v2 spec, and the User-Activation section of HTML5.

-Eric

PS: Some discussion of Safari’s behavior, and a blog post from the WebKit team about Safari’s implementation of the User Activation API.

2 thoughts on “Browser Basics: User Gestures”

Domenic Denicola says:

2020-05-18 at 23:00

Note that the “user activation v2” changes were merged into the HTML Standard in https://github.com/whatwg/html/pull/3851, so now the best source of information is actually the HTML Standard itself: https://html.spec.whatwg.org/#tracking-user-activation.

I also would suggest avoiding the Chromium-specific term “user gesture” in favor of “user activation”. In particular, there are a variety of gestures that do not cause activation, such as the scroll gesture.

1. ericlaw says:
  
  2020-05-19 at 00:20
  
  Thanks for the HTML reference. The term “user gesture” precedes Chrome by decades, used in both browsers and operating systems.