Python requests, change IP address

Question:

I am coding a web scraper for the website with the following Python code:

import requests

def scrape(url):
    req = requests.get(url)
    with open('out.html', 'w') as f:
        f.write(req.text)

It works a few times but then an error HTML page is returned by the website (when I open my browser, I have a captcha to complete).

Is there a way to avoid this “ban” by for example changing the IP address?

Asked By: user9145571

||

Answers:

As already mentioned in the comments and from yourself, changing the IP could help. To do this quite easily have a look at vpngate.py:

https://gist.github.com/Lazza/bbc15561b65c16db8ca8

An How to is provided at the link.

Answered By: Rend

You can use a proxy with the requests library. You can find some free proxies at a couple different websites like https://www.sslproxies.org/ and http://free-proxy.cz/en/proxylist/country/US/https/uptime/level3 but not all of them work and they should not be trusted with sensitive information.

example:

proxy = {
    "https": 'https://158.177.252.170:3128',
    "http": 'https://158.177.252.170:3128' 
}
response=requests.get('https://httpbin.org/ip', proxies=proxy)
Answered By: AJ Bensman

I recently answered this on another question here, but using the requests-ip-rotator library to rotate IPs through API gateway is usually the most effective way.
It’s free for the first million requests per region, and it means you won’t have to give your data to unreliable proxy sites.

Answered By: George

Late answer, I found this looking for IP-spoofing, but to the OP’s question – as some comments point out, you may or may not actually be getting banned. Here’s two things to consider:

  1. A soft ban: they don’t like bots. Simple solution that’s worked for me in the past is to add headers, so they think you’re a browser, e.g.,

    req = requests.get(url, headers={‘User-Agent’: ‘Mozilla/5.0’})

  2. On-page active elements, scripts or popups that act as content gates, not a ban per se – e.g., country/language selector, cookie config, surveys, etc. requiring user input. Not-as-simple solution: use a webdriver like Selenium + chromedriver to render the page including JS and then add "user" clicks to deal with the problems.

Answered By: DLuber
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.