my.remarkbox.com

unverified 7y, 170d ago

THANKS guys I feel like an idiot... however I really do appreciate the help.

remark

unverified 7y, 170d ago

You should change your webdriver test to this:

if (Object.getOwnPropertyDescriptor(navigator, 'webdriver')) { … }

remark

evan 7y, 170d ago

That actually doesn't work, even aside from any modifications. Object.getOwnPropertyDescriptor(navigator, 'webdriver') is undefined. Even if that weren't the case, couldn't you just then override the behavior of Object.getOwnPropertyDescriptor() to check for that special case?

remark

unverified 7y, 170d ago

I'm working with some scrapers and some days ago I was stucked because Chrome Headless doesn't have support to ignore errors with certifacates. So, to block it you need to have a nice certificate error.

remark

evan 7y, 170d ago

That's hilariously true. The good news though is that this changing in Chrome 65 (see this Chromium issue). If you use the current unstable branch then you should be able to ignore the certificate errors.

remark

unverified 7y, 170d ago

Chrome lets you bypass certificate errors by typing "badidea" - maybe sending same keys to Chrome Headless has the same effect?

remark

unverified 7y, 170d ago

Seems like your chrome test in your checking page always returns a pass because you have an element with the "chrome" id in the page.

remark

evan 7y, 170d ago

Ah, good call. I stand corrected.

remark

paulirish 7y, 169d ago

I set up a github repo where the headless detection attempts can fight directly against the evasions detailed above: https://github.com/paulirish/headless-cat-n-mouse

Evan, thanks for the test-headless-final.js, I was able to reuse this pretty directly.

Currently the headless detectors are winning and have outwitted the sneaky detection evaders. But I can imagine that can change...

remark

Si3GytR7 7y, 63d ago

Thanks for the great article! I ran test-headless-final.js from the repo but chrome test didn't pass. I've used the same puppeteer version. Have you changed the test page?

remark

Si3GytR7 7y, 63d ago

By replacing

window.navigator.chrome = {
  runtime: {},
  // etc.
};

with

window.chrome = {
  runtime: {},
  // etc.
};

the test passed.

remark

unverified 6y, 208d ago

Thank you for this; was scratching my head.

remark

unverified 7y, 7d ago

Very usefull. Of course, website owner cannot win against scrapers. In a case, we see out website scraped from a pool of over 1000 differents ip, with different UA and passed all our bot tests... We catched only because this fake ips are all on the same class b subnet . But of course, we should have close out some legit users... So no win-win solutions

remark

unverified 7y, 2d ago [edited]

I have tried the solution for navigator.webdriver, as I am running selenium- not headless but this flag is still True, using the javascript injection to chrome before page load, I do see the script runs before the page load, and the flag is changing, but right after I get into the next javascript (the test page one) the flag is changed again.

This is the only line I am using in my injecting.js:

Object.defineProperty(navigator, 'webdriver', {get: () => false,});

Thanks for the help... I am still looking

EDIT: I have found a solution using selenium only.

remark

evan 6y, 349d ago [edited]

You're likely running into an extension sandboxing issue here. The content script context will be sandboxed from the page context. You'll need to inject a script tag into the page's DOM to evaluate code outside of the sandbox. Here's an example of how you can do so:

const script = document.createElement('script');
script.innerHTML = 'alert("Do something else in here.")';
document.head.prepend(script)

remark

unverified 6y, 282d ago

I am experiencing the same problem with you. How did you fix it ? I have tried evan's following method (document.head.prepend(script)), just got an error 'Cannot read property 'prepend' of null'. It seems that content script running at 'document_start', seems too early to get 'head' of DOM. I am from China, waiting for your answer.Thank you very much.

remark

evan 6y, 282d ago

Yeah, you need to use document.documentElement instead of document.head if you're running at document_start. You should check out Breaking Out of the Chrome/WebExtension Sandbox for a lot more details on how to do this properly.

remark

unverified 6y, 359d ago

Why the webdriver property is getting reset on loading your test urls?

// Hide the Webdriver Browser property
async function hideWebdriverBrowserProperty(page) {
    await page.evaluateOnNewDocument(() => {
        Object.defineProperty(navigator, 'webdriver', {
           get: () => false,
        });
    });
}

hideWebdriverBrowserProperty(page);

/* The below statement returns 'true' for 'isWebDriverHidden' */
let isWebDriverHidden = await page.evaluate(() => navigator.webdriver);
console.log(isWebDriverHidden === true ? CORRECT_UTF : WRONG_UTF, 'Webdriver: ', isWebDriverHidden);

await page2.goto('https://intoli.com/blog/making-chrome-headless-undetectable/chrome-headless-test.html', {waitUntil: 'networkidle0'});
await page2.goto('https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html', {waitUntil: 'networkidle0'});

/* The below statment always returns 'false' for 'isWebDriverHidden'. Why? */
isWebDriverHidden = await page.evaluate(() => navigator.webdriver); 
console.log(isWebDriverHidden === true ? CORRECT_UTF : WRONG_UTF, 'Webdriver: ', isWebDriverHidden);

remark

Why the `webdriver` property is getting reset on loading your test urls?
    
    // Hide the Webdriver Browser property
    async function hideWebdriverBrowserProperty(page) {
	    await page.evaluateOnNewDocument(() => {
	        Object.defineProperty(navigator, 'webdriver', {
	           get: () => false,
	        });
	    });
    }

hideWebdriverBrowserProperty(page);

/* The below statement returns 'true' for 'isWebDriverHidden' */
    let isWebDriverHidden = await page.evaluate(() => navigator.webdriver);
    console.log(isWebDriverHidden === true ? CORRECT_UTF : WRONG_UTF, 'Webdriver: ', isWebDriverHidden);

await page2.goto('https://intoli.com/blog/making-chrome-headless-undetectable/chrome-headless-test.html', {waitUntil: 'networkidle0'});
    await page2.goto('https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html', {waitUntil: 'networkidle0'});

/* The below statment always returns 'false' for 'isWebDriverHidden'. Why? */
    isWebDriverHidden = await page.evaluate(() => navigator.webdriver); 
    console.log(isWebDriverHidden === true ? CORRECT_UTF : WRONG_UTF, 'Webdriver: ', isWebDriverHidden);

evan 6y, 349d ago

Could you provide some more details? Are you using the Puppeteer code from above?

remark

unverified 6y, 352d ago

I have tried to implement the navigator.webdriver code into an extension for Selenium in Chrome, however I get a very weird result, when I push an alert with alert("navigator.webdriver") it returns false, however the test still picks it up as being true. Any idea why this is?

remark

evan 6y, 349d ago [edited]

If you're using a Web Extension, then the content script context will be sandboxed from the page context. You'll need to inject a script tag into the page's DOM to evaluate code outside of the sandbox. Here's an example of how you can do so:

const script = document.createElement('script');
script.innerHTML = 'alert("Do something else in here.")';
document.head.prepend(script)

remark

unverified 6y, 349d ago

Hi,

I read all your posts regarding how to avoid headless detection. I was able to obtain an unique working script with both first and second set of test by using puppeteer and chromium. Now after reading how to inject javascript with selenium and marionette with firefox, I'm trying to obtain a script that is able to bypass both previous test with firefox. I already tried both solutions via execute_async_script and via web extension since firefox supports it also in headless mode differently to chromium without success.

Can you point out how to achieve the same result of chromium with firefox? Is there documentation that I can follow?

Thank you

remark

evan 6y, 349d ago

Did you run into issues using a Web Extension to inject the scripts with Firefox? I think that's the most reliable way to evaluate code in Firefox before the page's JavaScript has a chance to run.

remark

unverified 6y, 334d ago

Hi, thank you for the interesting guide. However, I found that some Web site like http://stubhub.com uses distil technology https://en.wikipedia.org/wiki/Distil_Networks and these countermeasure does not work. Maybe, they detect mouse movement or clicking activity https://ibb.co/gqFpgK. What do you think about?

remark

evan 6y, 334d ago

The bypasses here are designed to get around a specific set of tests. Distil runs different tests, and you'll need to adapt the general techniques to work with their specific tests. You can either use a tool like OpenWPM to try to figure out what browser properties that they're probing, or you could attempt to reverse engineer the JavaScript tests that they run in the page.

remark

unverified 6y, 313d ago

Hi also interested in the topic of Javascript Fingerprinting, is there any resources you find particularly helpful?. I've gone through OpenWMP but its a little difficult being in all python and coming from just a javascript background. Thanks alot for these articles they are very informative.

remark

evan 6y, 313d ago

We actually have a really cool open-source JavaScript library in the works for detecting and analyzing browser fingerprinting. It's not quite ready for prime time, but it should hopefully be made public at some point in the next few months. In the meantime, Don't FingerPrint Me (DFPM) is a very useful Chrome extension that detects certain subsets of fingerprinting.

I'm glad to hear that you find the articles informative!

remark

load more (1 remarks)

unverified 6y, 302d ago

Does your proxy bypass DIsitil detection? Thanks!

remark

evan 6y, 299d ago

We provide clean residential IPs and make it very easy to render responses in full browsers with randomized footprints. It's hard to say that this fully bypasses detection because bot-mitigation services also depend on traffic patterns and a variety of other factors. Our service definitely does makes scraping much harder to detect though, and we employ techniques that go far beyond what we've written about on our blog.

remark

unverified 5y, 319d ago

While I'm sure you enjoy the SEO from the title of this article. It is a false statement, by your own admission. You should rename the article or include a note about distil networks. The premise "IT IS NOT POSSIBLE TO DETECT AND BLOCK CHROME HEADLESS" is unture, as distil networks is able to very efficiently.

remark

evan 5y, 238d ago

I assure you that it's completely possible to get around Distil if you're motivated to do so. The point of this post isn't that it's a drop-in solution to every detection attempt. The point is more along the lines that any set of tests are fragile, and it's always possible to work around them.

remark

unverified 6y, 306d ago

Thanks for these useful tips !

I was wondering how we could implement this part "we could mock the plugins", as it doesn't seem possible to either use the PluginArray or Plugin interfaces constructors to create new PluginArray and Plugin objects.

The idea would be to have a real PluginArray object instead of a just a random array with integers, when overwriting the 'plugins' property.

remark

evan 6y, 305d ago

You can't construct actual PluginArray or Plugin objects, but you can create your own objects which mock the same APIs and would pass any sort of testing that's applied. For example,

function MyPlugin () { }
const plugin = new MyPlugin();
Object.setPrototypeOf(plugin, Plugin.prototype);
// Outputs: true
console.log(plugin instanceof Plugin);

will create a plugin object that appears to be a Plugin. You would need to set whatever properties you are interested in mocking to make it seem more like a real plugin.

remark

unverified 6y, 270d ago

I was really hoping this would help me get my scraper for Citi's website working again. I want to scrape online.citi.com for my account balance and transaction info, but as of Oct 1st, my selenium script gets blocked.

FWIW, I get a traceback from citi's fingerprinter because the plugin objects don't have a filename attribute, so I changed the plugin workaround to:

        // Overwrite the `plugins` property to use a custom getter.
        Object.defineProperty(navigator, 'plugins', {
          // This just needs to have `length > 0` for the current test,
          // but we could mock the plugins too if necessary.
          get: () => [
            {filename:'internal-pdf-viewer'},
            {filename:'adsfkjlkjhalkh'},
            {filename:'internal-nacl-plugin'}
          ],
        });

I don't get the traceback anymore, but something is still giving me away, cuz I still get blocked.

remark

evan 6y, 213d ago

Yeah, the bypasses developed here were designed to address a specific test suite that only checked the plugin lengths. The simplistic mocks can cause problems if the test code explores the plugins in more detail. I recommend using a debugger to see what exactly they're checking.

remark

unverified 6y, 264d ago

A very interesting article. I've implemented the ideas shown but for one site that I am trying to scrape, it uses google recaptcha invisible and shows the captcha popup when I run my scrape code from my server but not from my dev machine. Have you had any luck convincing google recaptcha is a human?

remark

evan 6y, 213d ago

If the same code is working on one machine, but not another, then this is likely either due to the IP address or information about the underlying OS leaking. Are you using the same proxy service on both machines?

remark

unverified 6y, 207d ago

What about the $cdc and $wdc variables in Selenium's jar. Don't those need to be hex edited out?

Also, its my understanding that navigator.webdriver = false should instead return 'undefined' instead of false.

I believe this article may be a little out of date.

remark

evan 6y, 206d ago

The examples in this article use Puppeteer rather than Selenium. The cdc and wdc variables are only relevant when working with Selenium.

The value of navigator.webdriver should depend on what the spoofed user agent is. It's false for Firefox and undefined for Chrome. The code bypasses in the article were all written in response to a specific test suite that only checked whether navigator.webdriver had a truthy value. With more sophisticated fingerprinting approaches, you'll need to more completely emulate a realistic browser fingerprint.

remark

unverified 6y, 160d ago

That's what I thought, but this site seems to block most attempts from Chromium and Headless Chrome... any idea how to crawl it? http://www.faintinggoatdc.com/food/dinner-menu/

remark

unverified 6y, 156d ago

Evan, thanks for the great info / blogs. Do you have an example of doing this for Selenium with Headless Chrome instead of Puppeteer. Specifically where and how you set Object.defineProperty(navigator, 'webdriver', {get: () => false,.

Thanks

remark

evan 6y, 143d ago

We have another post on injecting JavaScript using various browser automation frameworks. I think that the best overall approach for Selenium is to use a browser extension if you want your script to execute before scripts on the page. The only tricky part about that is that the navigator properties are sandboxed, so you can't directly modify them to change the browser fingerprint. We have a workaround for that in this other post.

remark

WR1ELqFF 6y, 137d ago

through the following address test, I found that Chrome = missing(failed)

https://try-puppeteer.appspot.com/

why?

remark

evan 6y, 136d ago

Can you post the exact code that you're running there?

remark

1Do4y5JH 6y, 135d ago [edited]

I would like to suggest add configurable: true to the descriptor object used by Object.defineProperty. According to my experience, if you invoke Object.defineProperty twice to define a property, the second time raises an TypeError saying that Cannot redefine property: blablabla. If you pass configurable: true, this error will disappear, and things work fine, which means that they can't detect you this way.

EDIT: I'm not sure about the correct behavior of the navigator.webdriver property. If originally it shouldn't be configurable in the normal mode, then we shouldn't add configurable: true.

EDIT++: On my Chrome (72.0.3626.109), the default value of navigator.webdriver is undefined.

remark

unverified 5y, 354d ago

You're doing a good thing here mate. Appreciate the widsom, needed some help with that javascript and couldn't find any actual code. Thanks!

remark

unverified 5y, 314d ago [edited]

I found out that Accept-Language header was missing when headless: true even using proposed setup. I solved the problem by adding --lang=en-US

let settings = [
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--lang=en-US'
    ];
browser = await puppeteer.launch({headless: true, args: settings});

Without this, even with proposed setup, i had trouble bypassing some sites. For example Avvo. Avvo was doing great when headless: false but when headless: true i couldn't pass even single page.

Lack of that header i figured out after comparing request made via headless chrome using http://scooterlabs.com/echo.json. You can check details at this link

remark

unverified 5y, 278d ago

Nice article but to get around the navigator.webdriver issue, it's as easy as adding enable-automation to the excludeSwitches chrome option.

so (I'm using Nodejs Webdriver):

const options = new chrome.Options();
options.excludeSwitches('enable-automation');

Checking in the developer tool shows undefined

remark