Hacker News new | past | comments | ask | show | jobs | submit login

Hah, I literally just fought this for the past month. We run a large esports league that relies on player ranked data. They have the data, and as mentioned above, they send it down to the browser in beautiful JSON objects.

But they're sitting behind Cloudflare and aggressively blocking attempts to fetch data programmatically, which is a huge problem for us with 6000+ players worth of data to fetch multiple times every 3 months.

So... I built a Chrome Extension to grab the data at a speed that is usually under their detection rate. Basically created a distributed scraper and passed it out to as many people in the league as I could.

For big jobs when we want to do giant batches, it was a simple matter of doing the pulls and when we start getting 429 errors (rate limit blocking code they use), switch to a new IP on the VPN.

The only way they can block us now is if they stop having a website.

Give one of the commercial VPN providers a try. They're usually pretty cheap and have tons of IPs all over the place. Adding a "VPN Disconnect / Reconnect" step to the process only added about 10 seconds per request every so often.




It probably doesn't save you much, since you already built the chrome extension, but having done both I found that tampermonkey is often much easier to deal with in most cases and also much quicker to develop for (you can literally edit the script in the tampermonkey extension settings page and reload the page you want it to apply to for immediate testing).


I might be wrong, but some sites can block 'self' origin scripts by leaving it out of the Content Security Policy and only allowing scripts they control served by a CDN or specified subdomain to run on their page. Not sure when I last tried this and on what browser(s).

You'd have to disable CSP manually in your browser config to make it work, but that leaves you with an insecure browser and a lot of friction for casual users. Not sure if you can tie about:config options to a user profile for this use case. Distributing a working extension/script is getting harder all the time.


I don't recall if I've encountered that specific problem in tampermonkey (or if I did and it didn't cause a problem worth remembering), but you can run things in the extension's context as well to bypass certain restrictions, as well as use special extension provided functions (GM_* from the greasemoney standard) that allow for additional actions.

I do recall intercepting requests when I used a chrome extension to change CSP values though and not needing to when doing something similar later in tampermonkey, but it may not have been quite the same issue as you're describing, so I can't definitively say whether I had a problem with it or not.


a VPN won't do anything to help you with instagram

the best of the best are 4G rotating proxies

the fingerprint needs to change also




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: