You should change your webdriver test to this:
if (Object.getOwnPropertyDescriptor(navigator, 'webdriver')) { … }
I'm working with some scrapers and some days ago I was stucked because Chrome Headless doesn't have support to ignore errors with certifacates. So, to block it you need to have a nice certificate error.
That's hilariously true. The good news though is that this changing in Chrome 65 (see this Chromium issue). If you use the current unstable branch then you should be able to ignore the certificate errors.
I set up a github repo where the headless detection attempts can fight directly against the evasions detailed above: https://github.com/paulirish/headless-cat-n-mouse
Evan, thanks for the test-headless-final.js, I was able to reuse this pretty directly.
Currently the headless detectors are winning and have outwitted the sneaky detection evaders. But I can imagine that can change...
Very usefull. Of course, website owner cannot win against scrapers. In a case, we see out website scraped from a pool of over 1000 differents ip, with different UA and passed all our bot tests... We catched only because this fake ips are all on the same class b subnet . But of course, we should have close out some legit users... So no win-win solutions
I have tried the solution for navigator.webdriver, as I am running selenium- not headless but this flag is still True, using the javascript injection to chrome before page load, I do see the script runs before the page load, and the flag is changing, but right after I get into the next javascript (the test page one) the flag is changed again.
This is the only line I am using in my injecting.js:
Object.defineProperty(navigator, 'webdriver', {get: () => false,});
Thanks for the help... I am still looking
EDIT: I have found a solution using selenium only.
You're likely running into an extension sandboxing issue here. The content script context will be sandboxed from the page context. You'll need to inject a script
tag into the page's DOM to evaluate code outside of the sandbox. Here's an example of how you can do so:
const script = document.createElement('script'); script.innerHTML = 'alert("Do something else in here.")'; document.head.prepend(script)
I am experiencing the same problem with you. How did you fix it ? I have tried evan's following method (document.head.prepend(script)), just got an error 'Cannot read property 'prepend' of null'. It seems that content script running at 'document_start', seems too early to get 'head' of DOM. I am from China, waiting for your answer.Thank you very much.
Yeah, you need to use document.documentElement
instead of document.head
if you're running at document_start
. You should check out Breaking Out of the Chrome/WebExtension Sandbox for a lot more details on how to do this properly.
Why the webdriver
property is getting reset on loading your test urls?
// Hide the Webdriver Browser property async function hideWebdriverBrowserProperty(page) { await page.evaluateOnNewDocument(() => { Object.defineProperty(navigator, 'webdriver', { get: () => false, }); }); } hideWebdriverBrowserProperty(page); /* The below statement returns 'true' for 'isWebDriverHidden' */ let isWebDriverHidden = await page.evaluate(() => navigator.webdriver); console.log(isWebDriverHidden === true ? CORRECT_UTF : WRONG_UTF, 'Webdriver: ', isWebDriverHidden); await page2.goto('https://intoli.com/blog/making-chrome-headless-undetectable/chrome-headless-test.html', {waitUntil: 'networkidle0'}); await page2.goto('https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html', {waitUntil: 'networkidle0'}); /* The below statment always returns 'false' for 'isWebDriverHidden'. Why? */ isWebDriverHidden = await page.evaluate(() => navigator.webdriver); console.log(isWebDriverHidden === true ? CORRECT_UTF : WRONG_UTF, 'Webdriver: ', isWebDriverHidden);
I have tried to implement the navigator.webdriver code into an extension for Selenium in Chrome, however I get a very weird result, when I push an alert with alert("navigator.webdriver") it returns false, however the test still picks it up as being true. Any idea why this is?
If you're using a Web Extension, then the content script context will be sandboxed from the page context. You'll need to inject a script
tag into the page's DOM to evaluate code outside of the sandbox. Here's an example of how you can do so:
const script = document.createElement('script'); script.innerHTML = 'alert("Do something else in here.")'; document.head.prepend(script)
Hi,
I read all your posts regarding how to avoid headless detection. I was able to obtain an unique working script with both first and second set of test by using puppeteer and chromium. Now after reading how to inject javascript with selenium and marionette with firefox, I'm trying to obtain a script that is able to bypass both previous test with firefox. I already tried both solutions via execute_async_script and via web extension since firefox supports it also in headless mode differently to chromium without success.
Can you point out how to achieve the same result of chromium with firefox? Is there documentation that I can follow?
Thank you
Hi, thank you for the interesting guide. However, I found that some Web site like http://stubhub.com uses distil technology https://en.wikipedia.org/wiki/Distil_Networks and these countermeasure does not work. Maybe, they detect mouse movement or clicking activity https://ibb.co/gqFpgK. What do you think about?
The bypasses here are designed to get around a specific set of tests. Distil runs different tests, and you'll need to adapt the general techniques to work with their specific tests. You can either use a tool like OpenWPM to try to figure out what browser properties that they're probing, or you could attempt to reverse engineer the JavaScript tests that they run in the page.
Hi also interested in the topic of Javascript Fingerprinting, is there any resources you find particularly helpful?. I've gone through OpenWMP but its a little difficult being in all python and coming from just a javascript background. Thanks alot for these articles they are very informative.
We actually have a really cool open-source JavaScript library in the works for detecting and analyzing browser fingerprinting. It's not quite ready for prime time, but it should hopefully be made public at some point in the next few months. In the meantime, Don't FingerPrint Me (DFPM) is a very useful Chrome extension that detects certain subsets of fingerprinting.
I'm glad to hear that you find the articles informative!
load more (1 remarks)
While I'm sure you enjoy the SEO from the title of this article. It is a false statement, by your own admission. You should rename the article or include a note about distil networks. The premise "IT IS NOT POSSIBLE TO DETECT AND BLOCK CHROME HEADLESS" is unture, as distil networks is able to very efficiently.
Thanks for these useful tips !
I was wondering how we could implement this part "we could mock the plugins", as it doesn't seem possible to either use the PluginArray or Plugin interfaces constructors to create new PluginArray and Plugin objects.
The idea would be to have a real PluginArray object instead of a just a random array with integers, when overwriting the 'plugins' property.
You can't construct actual PluginArray
or Plugin
objects, but you can create your own objects which mock the same APIs and would pass any sort of testing that's applied. For example,
function MyPlugin () { } const plugin = new MyPlugin(); Object.setPrototypeOf(plugin, Plugin.prototype); // Outputs: true console.log(plugin instanceof Plugin);
will create a plugin object that appears to be a Plugin
. You would need to set whatever properties you are interested in mocking to make it seem more like a real plugin.
I was really hoping this would help me get my scraper for Citi's website working again. I want to scrape online.citi.com for my account balance and transaction info, but as of Oct 1st, my selenium script gets blocked.
FWIW, I get a traceback from citi's fingerprinter because the plugin objects don't have a filename attribute, so I changed the plugin workaround to:
// Overwrite the `plugins` property to use a custom getter. Object.defineProperty(navigator, 'plugins', { // This just needs to have `length > 0` for the current test, // but we could mock the plugins too if necessary. get: () => [ {filename:'internal-pdf-viewer'}, {filename:'adsfkjlkjhalkh'}, {filename:'internal-nacl-plugin'} ], });
I don't get the traceback anymore, but something is still giving me away, cuz I still get blocked.
A very interesting article. I've implemented the ideas shown but for one site that I am trying to scrape, it uses google recaptcha invisible and shows the captcha popup when I run my scrape code from my server but not from my dev machine. Have you had any luck convincing google recaptcha is a human?
What about the $cdc and $wdc variables in Selenium's jar. Don't those need to be hex edited out?
Also, its my understanding that navigator.webdriver = false should instead return 'undefined' instead of false.
I believe this article may be a little out of date.
The examples in this article use Puppeteer rather than Selenium. The cdc
and wdc
variables are only relevant when working with Selenium.
The value of navigator.webdriver
should depend on what the spoofed user agent is. It's false
for Firefox and undefined
for Chrome. The code bypasses in the article were all written in response to a specific test suite that only checked whether navigator.webdriver
had a truthy value. With more sophisticated fingerprinting approaches, you'll need to more completely emulate a realistic browser fingerprint.
That's what I thought, but this site seems to block most attempts from Chromium and Headless Chrome... any idea how to crawl it? http://www.faintinggoatdc.com/food/dinner-menu/
Evan, thanks for the great info / blogs. Do you have an example of doing this for Selenium with Headless Chrome instead of Puppeteer. Specifically where and how you set Object.defineProperty(navigator, 'webdriver', {get: () => false,.
Thanks
We have another post on injecting JavaScript using various browser automation frameworks. I think that the best overall approach for Selenium is to use a browser extension if you want your script to execute before scripts on the page. The only tricky part about that is that the navigator
properties are sandboxed, so you can't directly modify them to change the browser fingerprint. We have a workaround for that in this other post.
I would like to suggest add configurable: true
to the descriptor object used by Object.defineProperty
. According to my experience, if you invoke Object.defineProperty
twice to define a property, the second time raises an TypeError
saying that Cannot redefine property: blablabla
. If you pass configurable: true
, this error will disappear, and things work fine, which means that they can't detect you this way.
EDIT: I'm not sure about the correct behavior of the navigator.webdriver
property. If originally it shouldn't be configurable in the normal mode, then we shouldn't add configurable: true
.
EDIT++: On my Chrome (72.0.3626.109), the default value of navigator.webdriver
is undefined
.
I found out that Accept-Language
header was missing when headless: true
even using proposed setup. I solved the problem by adding --lang=en-US
let settings = [ '--no-sandbox', '--disable-setuid-sandbox', '--lang=en-US' ]; browser = await puppeteer.launch({headless: true, args: settings});
Without this, even with proposed setup, i had trouble bypassing some sites. For example Avvo
. Avvo
was doing great when headless: false
but when headless: true
i couldn't pass even single page.
Lack of that header i figured out after comparing request made via headless chrome using http://scooterlabs.com/echo.json
. You can check details at this link
Nice article but to get around the navigator.webdriver issue, it's as easy as adding enable-automation
to the excludeSwitches
chrome option.
so (I'm using Nodejs Webdriver):
const options = new chrome.Options(); options.excludeSwitches('enable-automation');
Checking in the developer tool shows undefined