A few years back, I wrote a short explainer about User Gestures, a web platform concept whereby certain sensitive operations (e.g. opening a popup window) will first attempt to confirm whether the user intentionally requested the action.
As noted in that post, gestures are a weak primitive — while checking whether the user clicked or tapped a key is simple, gestures poorly suit the design ideal of signaling an unambiguous user request.
A recent blog post by security researcher Paulos Yibelo clearly explains a class of attack whereby a user is enticed to hold down a key (say, Enter
) and that gesture is treated as both an acceptance of a popup window and results in activating a button on a target victim website. If the button on that website performs a dangerous operation (“Grant access”, “Transfer money“, etc), the victim’s security may be irreversibly compromised.
The author calls the attack a cross window forgery, although I’d refer to it as a gesture-jacking attack, as it’s most similar to the ClickJacking attack vector which came to prominence in 2008. Back then, browsers vendors responded by adding defenses against ClickJacking attacks against subframes, first with IE’s X-Frame-Options
response header, and later with the frame-ancestors
directive in Content Security Policy. At the time, cross-window ClickJacking was recognized as a threat unmitigated by the new defenses, but it wasn’t deemed an especially compelling attack.
In contrast, the described gesture-jacking attack is more reliable, as it does not rely upon the careful positioning of windows, timing of clicks, and the vagaries of a user’s display settings. Instead, the attacker entices the user to hold down a key, spawns a victim web page, and the keydown is transferred to the victim page. Easy breezy.
Some folks expected that this attack shouldn’t be possible– “browsers have popup-blockers after all!” Unfortunately for their hopes and dreams, the popup blocker isn’t magical. Holding the Enter key is a user-gesture, so the attacker’s page is allowed to spawn a popup window to a victim site.
As with many cool attack techniques, the core of this attack depends upon a built-in web platform behavior. Specifically, when you navigate to a URL containing a fragment:
…the browser will automatically scroll to the first (if any) element with an id
matching the fragment’s value, and set focus to it if possible. As a result, keyboard input will be directed to that element.
As noted in Paulos Yibelo’s blog post, a website can help protect itself against unintentional button activations by not adding id
attributes to critical buttons, or by randomizing the id
value on each page load. Or the page can “redirect” on load to strip off an unexpected URL Fragment.
For Chromium-based browsers, an additional option is available: a document can declare that it doesn’t want the default button-focusing behavior.
The force-load-at-top document policy (added as opt-out for the cool Scroll-to-Text-Fragment feature) allows a website to turn off all types of automatic scrolling (and focusing) from the fragment. In Edge and Chrome, you can compare the difference between a page loaded:
Browser support is not universal, but Firefox is considering adding it.
Beyond taking the steps above to protect sensitive buttons from held keys, your most sensitive pages should consider:
frame-ancestors
CSP to prevent framing4. Consider whether an out-of-band confirmation would be possible (e.g. a confirmation prompt shown by the user’s mobile app)
It’s not just websites that ask users to make security decisions or confirm sensitive actions. For instance, consider these browser prompts:
As you might expect, attackers have long used gesture-jacking to abuse browser UI, and browser teams have had to make many updates to prevent the abuse:
Common defenses to protect browser UI have included changing the default button to the safe choice (e.g. “Deny”) and introducing an “input protection” activation timer.
Stay safe out there!
-Eric