Authenticate Scrapy HTTP Proxy
Question:
I can set an http proxy using request.meta[‘proxy’], but how do I authenticate the proxy?
This does not work to specify user and pass:
request.meta['proxy'] = 'http://user:[email protected]:2222'
From looking around, I may have to send request.headers[‘Proxy-Authorization’], but what format do I send it in?
Answers:
username and password are base64 encoded in the form “username:password”
import base64
# Set the location of the proxy
proxy_string = choice(self._get_proxies_from_file('proxies.txt')) # user:pass@ip:port
proxy_items = proxy_string.split('@')
request.meta['proxy'] = "http://%s" % proxy_items[1]
# setup basic authentication for the proxy
user_pass=base64.encodestring(proxy_items[0])
request.headers['Proxy-Authorization'] = 'Basic ' + user_pass
The w3lib module has a very convenient function for this usecase.
from w3lib.http import basic_auth_header
request.meta["proxy"] = "http://192.168.1.1:8050"
request.headers["Proxy-Authorization"] = basic_auth_header(proxy_user, proxy_pass)
This is also mentioned in a blog article of Zyte (the maintainers of scrapy)
I can set an http proxy using request.meta[‘proxy’], but how do I authenticate the proxy?
This does not work to specify user and pass:
request.meta['proxy'] = 'http://user:[email protected]:2222'
From looking around, I may have to send request.headers[‘Proxy-Authorization’], but what format do I send it in?
username and password are base64 encoded in the form “username:password”
import base64
# Set the location of the proxy
proxy_string = choice(self._get_proxies_from_file('proxies.txt')) # user:pass@ip:port
proxy_items = proxy_string.split('@')
request.meta['proxy'] = "http://%s" % proxy_items[1]
# setup basic authentication for the proxy
user_pass=base64.encodestring(proxy_items[0])
request.headers['Proxy-Authorization'] = 'Basic ' + user_pass
The w3lib module has a very convenient function for this usecase.
from w3lib.http import basic_auth_header
request.meta["proxy"] = "http://192.168.1.1:8050"
request.headers["Proxy-Authorization"] = basic_auth_header(proxy_user, proxy_pass)
This is also mentioned in a blog article of Zyte (the maintainers of scrapy)