How to webscrape with VPN in Python?

Question:

I have made a Python program that webscrapes IMDB with Beautifulsoup to make a mySQL database with tables of all the top rated movies in the different categories. So far so good. My problem is that I am doing this from Norway, and many of the movie titles are translated to Norwegian. For example, in the top list of IMDB opened from a Norwegian IP adress, “The Shawshank Redemption” is translated to “Frihetens Regn”. I want all the titles in English. Are there maybe some free VPNs that you can activate from Python and that works with Beautifulsoup? Or do anyone have another solution to this?

Asked By: Jaran Mellerud

||

Answers:

You have a couple options, VPN and Proxy.

First, yes you can use a VPN. However most VPN requires the entire host connection to tunnel through the VPN. There are a few good VPN service out there, but sometimes you get what you pay for. I would caution using free VPN because some sell your network and other sell your data.

Second, this might be the easiest option. Using proxies. You can tell your scraper to proxy traffic though a free anonymous proxy. You can find a list of these free proxy from Google. Or you can check out ProxyBroker which finds free proxy for you. This only requires proxy the scraper traffic through a US IP address instead of your entire host connection.

Answered By: user12541086

I agree that using proxies will work better rather than using a vpn.

However, don’t go with a free proxy, if you want results. If it’s something you can invest in, get a decent paid provider, otherwise most likely nothing good will come out of this, as you will constantly get blocked.

Answered By: Drapky

I think the only thing you need is content in the English language. Customizing request headers can help with this. For example:

Accept-Language : ‘eng’

Answered By: Abdullah Shoukat