The Evolution of Automated Browser Detection: A Cat & Mouse Game
2024-10-18 05:32:14 Author: securityboulevard.com(查看原文) 阅读量:3 收藏

Throughout the history of bot protection, automated browser detection has always been a top priority. This article traces the history of browser automation and the ongoing battle between bot developers and detection strategies. From the early days of Selenium to the latest developments in bidirectional browser control, we explore how DataDome has consistently stayed ahead in this technological cat and mouse game.

Why automate browsers?

The easiest way to create a bot is to send direct network requests to the target server. Compared to this, automating an entire browser seems like overkill. So why do it?

As websites implement measures to protect themselves from automated traffic, request-based bots become harder to use. Server-side protections can detect inconsistencies in HTTP headers, while client-side protections require JavaScript (JS) “proof of work” challenges before a request can proceed.

Automated browsers simulate human-like browsing behavior, sending requests with authentic HTTP headers and executing JS as a real user’s browser would. This makes them ideal for easily bypassing basic defenses. To the untrained eye, automated browser traffic is indistinguishable from genuine user traffic.

Esper

AWS

At DataDome, detecting these sophisticated automated browsers has been a primary objective of our client-side detection team.

The Early Days: Selenium, WebDriver, & The First Signal

The Birth of Selenium

Before 2004, browser automation was a fragmented landscape. Solutions were unreliable and limited to specific browsers or platforms. The web testing community needed a unified, open-source framework that could work across different browsers.

Enter Selenium. Named after the element used to treat mercury poisoning (a nod to Mercury Interactive, the dominant testing software at the time), Selenium aimed to be the cure for the current state of browser automation.

Alliance with WebDriver

In 2009, Selenium merged with its main competitor, WebDriver. The best aspects of each project were used to create a layered architecture that became the standard for browser automation: Selenium provides a high-level API, while WebDriver acts as the bridge between this API and the browser. Each browser vendor maintains its own Webdriver implementation (e.g., ChromeDriver for Chrome, GeckoDriver for Firefox), that controls the browser using internal mechanisms.

The First Signal

The WebDriver specification included a detail that became historical: the JavaScript Web API navigator.webdriver is set to true when the browser is controlled via WebDriver. Therefore, websites could simply check this value and block the request if it was true. This flag marked the start of an ongoing battle between bot developers and detection strategies.

The Skirmishes of the First Generation

Ramping Up Detection

When WebDriver bots realized they were being detected through navigator.webdriver, they simply toggled it back to false. More sophisticated detection methods were needed.

DataDome examined the Selenium and drivers source code and discovered that automated browsers left traces in the form of JS variables and events that are absent in normal browser use. We integrated these checks into our system, successfully blocking the bots using that framework again.

The First Anti-Detection Frameworks

The botting community didn’t remain idle. They began developing frameworks specifically designed to dodge the new detection methods. These “first generation” tools focused on hiding the known signs of automation.

Our team at DataDome analyzed the source code of these anti-detection frameworks and made an interesting discovery: in their attempt to hide, these tools left their own unique fingerprints. This allowed us to develop methods that target the very tools designed to evade us. DataDome regained the upper hand in bot detection.

The Fork

Things were calm for a few years. Then in 2013, there was a major event: Chrome forked the Webkit engine. This left them free to do whatever they wanted, and led to two improvements that would soon rock the web testing and botting ecosystems: Headless Chrome, and CDP.

Headless Chrome

In 2017, Chrome released its official headless mode, allowing Chrome to be launched and automated without displaying a visible browser window. It dramatically improved performance and scalability, for both testers and botters.

Detecting Headless Browsers

At DataDome, we quickly identified several key indicators of headless Chrome usage:

  • We noticed several differences in browser attributes, such as the absence of values in navigator.plugins and navigator.mimeTypes.
  • Headless mode defaulted to software GPU rendering, which is detectable through specialized canvas and WebGL checks.

Our team integrated these checks into our detection system, once again staying ahead of the curve.

The CDP Rebellion

Chrome DevTools Protocol (CDP)

As the detection game intensified, browser vendors were quietly working on improving their automation capabilities. While the first Selenium version was automating a browser by executing JS in the browser sandbox, now each browser had developed an internal protocol that their drivers would use for that purpose. The Chrome Devtools Protocol is named because initially it was simply a medium between the Browser engine and the DevTools panel. It was later enhanced to support Chrome automation.

CDP is WebSocket-based. It supports bi-directional messaging, which enables a lower level, more direct control of the browser. After the Webkit fork, its development continued and CDP reached a level of control far beyond what would be possible with the old HTTP-based WebDriver.

The people who understood this the most were its creators: The chrome DevTool team. The release of headless Chrome was their signal for a bold move. They launched a brand new automation framework to control Chrome using CDP, bypassing WebDriver completely. They named it Puppeteer.

Puppeteer quickly gained traction in the automation community due to its power and flexibility.

Graph of the rising popularity of puppeteer.

Puppeteer Extra Stealth

Puppeteer gained popularity quickly, but it was very easy to detect. To counter this, a new anti-detection tool emerged: Puppeteer Extra Stealth. This framework focuses on hiding both headless mode and Puppeteer-specific signals. It became the go-to solution for avoiding detection. All bots-as-a-service tools were using it behind the scenes.

DataDome rose again to this new challenge. Our team conducted an in-depth analysis and found that just like the anti-detection frameworks of the first generation, Puppeteer Extra Stealth left a set of distinctive fingerprints in its trail. We developed a suite of new techniques to detect its usage, which proved highly effective.

New Headless

Chrome released a “new” headless mode in 2022, 5 years after the first version. This update rendered many of the previous headless detection techniques ineffective. We took a different approach, focusing on Puppeteer and the CDP protocol that powers it.

The Secret CDP Signal

After extensive research, we implemented a novel technique that detects the usage of a specific CDP method in a browser. It was used to block millions of bot requests every day—for years. This detection method eventually became known within the botting community, sparking a new wave of next-gen anti-detection frameworks aiming to circumvent it by either patching the CDP leak or reimplementing an automation interface in CDP.

As third-generation anti-detection frameworks continue to push the boundaries of stealth, DataDome is pushing back with a new arsenal of detection techniques. We can’t share all the details—but rest assured, we’re deploying innovative and sophisticated strategies to uncover and block even the most advanced attempts at evasion.

The Future is Bidirectional

The success of Puppeteer and Chrome’s CDP couldn’t be ignored by the web testing community. A new standard called WebDriver BiDi (short for bidirectional) was recently developed to bring CDP-like capabilities to all browsers.

The shift from classic WebDriver to BiDi is currently well underway. As of 2024, exactly 20 years after Selenium’s creation, BiDi is now fully supported by several major players like Google Chrome, Firefox, Puppeteer, and BrowserStack.

Of course, we’ve been closely following its development to understand the new capabilities it brings to automation. We’re preparing for how it could impact the bot protection landscape and are ready to adapt our strategies as it evolves. This way, we ensure we’re always a step ahead in the ongoing bot detection game.

Conclusion: DataDome’s Commitment to Advanced Bot Detection

The world of automated browser detection is one of constant evolution. From the early days of Selenium to the latest developments in bidirectional browser control, each advance in automation technology has been met with innovative detection methods.

At DataDome, we remain committed to staying at the forefront of this technological arms race. Our team of expert researchers and developers continues to analyze emerging technologies, develop cutting-edge detection methods, and ensure that our clients are protected against even the most sophisticated bot attacks.

As we look to the future, one thing is certain: the cat and mouse game between bot developers and detection systems will continue. And DataDome will be there, leading the charge in protecting the digital ecosystem from automated threats.

Ready to learn more? Book a demo today.


文章来源: https://securityboulevard.com/2024/10/the-evolution-of-automated-browser-detection-a-cat-mouse-game/
如有侵权请联系:admin#unsafe.sh