Python HTTP request with headers attached generates 403 error on cloud server, running fine on my machine
Question:
To wrap up the issue I found and need help on,
- I created a python program that calls a get request from
https://bx.in.th/api/pairing/
- The program works well on my machine (Mac OSX)
- Once running on a Digital Ocean Ubuntu droplet, it throws HTTP 403
forbidden error.
- I did a day of research and most of the answers are to modify headers
which I tried them all with no light of success.
Some links/references I went through.
- urllib2.HTTPError: HTTP Error 403: Forbidden
- Python 3.5 urllib.request 403 Forbidden Error
- HTTP error 403 in Python 3 Web Scraping
Here is the simplified source code that points to the problem :
import urllib.request
import json
url = 'https://bx.in.th/api/pairing/'
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive'
}
request = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(request)
print(response.read())
print()
print(response.getheaders())
The proper output should be :
b'{"1":{"pairing_id":1,"primary_currency":"THB","secondary_currency":"BTC"},"21":{"pairing_id":21,"primary_currency":"THB","secondary_currency":"ETH"},"22":{"pairing_id":22,"primary_currency":"THB","secondary_currency":"DAS"},"23":{"pairing_id":23,"primary_currency":"THB","secondary_currency":"REP"},"20":{"pairing_id":20,"primary_currency":"BTC","secondary_currency":"ETH"},"4":{"pairing_id":4,"primary_currency":"BTC","secondary_currency":"DOG"},"6":{"pairing_id":6,"primary_currency":"BTC","secondary_currency":"FTC"},"24":{"pairing_id":24,"primary_currency":"THB","secondary_currency":"GNO"},"13":{"pairing_id":13,"primary_currency":"BTC","secondary_currency":"HYP"},"2":{"pairing_id":2,"primary_currency":"BTC","secondary_currency":"LTC"},"3":{"pairing_id":3,"primary_currency":"BTC","secondary_currency":"NMC"},"26":{"pairing_id":26,"primary_currency":"THB","secondary_currency":"OMG"},"14":{"pairing_id":14,"primary_currency":"BTC","secondary_currency":"PND"},"5":{"pairing_id":5,"primary_currency":"BTC","secondary_currency":"PPC"},"19":{"pairing_id":19,"primary_currency":"BTC","secondary_currency":"QRK"},"15":{"pairing_id":15,"primary_currency":"BTC","secondary_currency":"XCN"},"7":{"pairing_id":7,"primary_currency":"BTC","secondary_currency":"XPM"},"17":{"pairing_id":17,"primary_currency":"BTC","secondary_currency":"XPY"},"25":{"pairing_id":25,"primary_currency":"THB","secondary_currency":"XRP"},"8":{"pairing_id":8,"primary_currency":"BTC","secondary_currency":"ZEC"}}'
[('Date', 'Sun, 13 Aug 2017 09:27:02 GMT'), ('Content-Type', 'text/javascript'), ('Content-Length', '1485'), ('Connection', 'close'), ('Set-Cookie', '__cfduid=d51c37ea835bae4a0c892e91f34f7bc131502616422; expires=Mon, 13-Aug-18 09:27:02 GMT; path=/; domain=.bx.in.th; HttpOnly'), ('Cache-Control', 'max-age=86400'), ('Expires', 'Mon, 14 Aug 2017 09:27:02 GMT'), ('Strict-Transport-Security', 'max-age=0'), ('X-Content-Type-Options', 'nosniff'), ('Server', 'cloudflare-nginx'), ('CF-RAY', '38daa2e36e0a836b-BKK')]
The error got from running the source code on the droplet :
raceback (most recent call last):
File "api-call.py", line 17, in <module>
response = urllib.request.urlopen(request)
File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/usr/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/usr/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Thank you!
Answers:
You have to use strong proxy like Luminati.
I also was getting 403 error status, but it works well with luminati proxy.
Had a similar problem on Digital Ocean
Solution is to sign up for a proxy and use it. Note: luminiti is now brightdata.com
Example from them below.
I suggest using Python’s requests module and then setting your call like this:
import requests
proxies = {'http': 'http://brd-customer-hl_234567a0-zone-isp:[email protected]:22225',
'https': 'http://brd-customer-hl_234567a0-zone-isp:[email protected]:22225'}
url = 'https://bx.in.th/api/pairing/'
headers = {'User-Agent': 'Mozilla/5.0 etc'}
r = requests.get(url, headers=headers, proxies=proxies, timeout=10)
r.status_code # should be 200, not 403
Use r.text
or r.json()
to read the api data from the request object.
Actually, you only need the https proxy for this example but it’s good practice to include them both.
To wrap up the issue I found and need help on,
- I created a python program that calls a get request from
https://bx.in.th/api/pairing/ - The program works well on my machine (Mac OSX)
- Once running on a Digital Ocean Ubuntu droplet, it throws HTTP 403
forbidden error. - I did a day of research and most of the answers are to modify headers
which I tried them all with no light of success.
Some links/references I went through.
- urllib2.HTTPError: HTTP Error 403: Forbidden
- Python 3.5 urllib.request 403 Forbidden Error
- HTTP error 403 in Python 3 Web Scraping
Here is the simplified source code that points to the problem :
import urllib.request
import json
url = 'https://bx.in.th/api/pairing/'
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive'
}
request = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(request)
print(response.read())
print()
print(response.getheaders())
The proper output should be :
b'{"1":{"pairing_id":1,"primary_currency":"THB","secondary_currency":"BTC"},"21":{"pairing_id":21,"primary_currency":"THB","secondary_currency":"ETH"},"22":{"pairing_id":22,"primary_currency":"THB","secondary_currency":"DAS"},"23":{"pairing_id":23,"primary_currency":"THB","secondary_currency":"REP"},"20":{"pairing_id":20,"primary_currency":"BTC","secondary_currency":"ETH"},"4":{"pairing_id":4,"primary_currency":"BTC","secondary_currency":"DOG"},"6":{"pairing_id":6,"primary_currency":"BTC","secondary_currency":"FTC"},"24":{"pairing_id":24,"primary_currency":"THB","secondary_currency":"GNO"},"13":{"pairing_id":13,"primary_currency":"BTC","secondary_currency":"HYP"},"2":{"pairing_id":2,"primary_currency":"BTC","secondary_currency":"LTC"},"3":{"pairing_id":3,"primary_currency":"BTC","secondary_currency":"NMC"},"26":{"pairing_id":26,"primary_currency":"THB","secondary_currency":"OMG"},"14":{"pairing_id":14,"primary_currency":"BTC","secondary_currency":"PND"},"5":{"pairing_id":5,"primary_currency":"BTC","secondary_currency":"PPC"},"19":{"pairing_id":19,"primary_currency":"BTC","secondary_currency":"QRK"},"15":{"pairing_id":15,"primary_currency":"BTC","secondary_currency":"XCN"},"7":{"pairing_id":7,"primary_currency":"BTC","secondary_currency":"XPM"},"17":{"pairing_id":17,"primary_currency":"BTC","secondary_currency":"XPY"},"25":{"pairing_id":25,"primary_currency":"THB","secondary_currency":"XRP"},"8":{"pairing_id":8,"primary_currency":"BTC","secondary_currency":"ZEC"}}'
[('Date', 'Sun, 13 Aug 2017 09:27:02 GMT'), ('Content-Type', 'text/javascript'), ('Content-Length', '1485'), ('Connection', 'close'), ('Set-Cookie', '__cfduid=d51c37ea835bae4a0c892e91f34f7bc131502616422; expires=Mon, 13-Aug-18 09:27:02 GMT; path=/; domain=.bx.in.th; HttpOnly'), ('Cache-Control', 'max-age=86400'), ('Expires', 'Mon, 14 Aug 2017 09:27:02 GMT'), ('Strict-Transport-Security', 'max-age=0'), ('X-Content-Type-Options', 'nosniff'), ('Server', 'cloudflare-nginx'), ('CF-RAY', '38daa2e36e0a836b-BKK')]
The error got from running the source code on the droplet :
raceback (most recent call last):
File "api-call.py", line 17, in <module>
response = urllib.request.urlopen(request)
File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/usr/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/usr/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Thank you!
You have to use strong proxy like Luminati.
I also was getting 403 error status, but it works well with luminati proxy.
Had a similar problem on Digital Ocean
Solution is to sign up for a proxy and use it. Note: luminiti is now brightdata.com
Example from them below.
I suggest using Python’s requests module and then setting your call like this:
import requests
proxies = {'http': 'http://brd-customer-hl_234567a0-zone-isp:[email protected]:22225',
'https': 'http://brd-customer-hl_234567a0-zone-isp:[email protected]:22225'}
url = 'https://bx.in.th/api/pairing/'
headers = {'User-Agent': 'Mozilla/5.0 etc'}
r = requests.get(url, headers=headers, proxies=proxies, timeout=10)
r.status_code # should be 200, not 403
Use r.text
or r.json()
to read the api data from the request object.
Actually, you only need the https proxy for this example but it’s good practice to include them both.