Python requests library added an additional header "Accept-Encoding: identity"
Question:
This is my code.
import requests
from sys import exit
proxies = {
"http": "127.0.0.1:8888",
"https": "127.0.0.1:8888",
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0",
"Accept-Encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive"
}
login_page = "http://www.test.com/login/"
r = requests.get(login_page, proxies = proxies, headers = headers)
original_cookies = r.cookies
exit(0)
This is what I got from fiddler2. As you can see, it added an additional header Accept-Encoding: identity
.
GET http://www.test.com/login/ HTTP/1.1
Accept-Encoding: identity
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Host: www.test.com
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
I’m using Python 3.3.2 on Windows 7 64 bit and requests 1.2.3.
Anyone can give some suggestions?
Thanks.
Answers:
This originates deep within the bowels of http.client
, which is used by urllib3
which is used by requests
.
http.client
actually checks if there is already an accept-encoding
in the headers dictionary passed, and if there is it skips adding the identity
header – the only problem is that what is passed as headers dictionary is something like this:
CaseInsensitiveDict({b'Accept-Encoding': 'gzip, deflate, compress', ...})
So why is it not working? requests
encodes the header names, and as in python3 a str
object compared to a bytes
object always is False
, the check performed in http.client
fails…
If you really want to get rid of the additional header, the quickest way would be to either comment out line 340 in requests/models.py, or monkeypatch requests.models.PreparedRequest.prepare_headers
edit:
this seems to be fixed in the (not yet released) 2.0 branch of requests
Thanks to @mata‘s answer, I’ve been able monkey patched HTTPConnection.putheader
to ignore Accept-Encoding: identity
in my particular case:
from http.client import HTTPConnection
def drop_accept_encoding_on_putheader(http_connection_putheader):
def wrapper(self, header, *values):
if header == "Accept-Encoding" and "identity" in values:
return
return http_connection_putheader(self, header, *values)
return wrapper
HTTPConnection.putheader = drop_accept_encoding_on_putheader(HTTPConnection.putheader)
s = requests.Session()
s.headers.clear()
r = s.post("https://httpbin.org/post")
print(r.json()["headers"])
Result:
{'Content-Length': '0',
'Host': 'httpbin.org',
'User-Agent': 'python-urllib3/1.26.8',
'X-Amzn-Trace-Id': 'Root=1-639e06af-32dea6906aff32526a081e8e'}
This is my code.
import requests
from sys import exit
proxies = {
"http": "127.0.0.1:8888",
"https": "127.0.0.1:8888",
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0",
"Accept-Encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive"
}
login_page = "http://www.test.com/login/"
r = requests.get(login_page, proxies = proxies, headers = headers)
original_cookies = r.cookies
exit(0)
This is what I got from fiddler2. As you can see, it added an additional header Accept-Encoding: identity
.
GET http://www.test.com/login/ HTTP/1.1
Accept-Encoding: identity
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Host: www.test.com
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
I’m using Python 3.3.2 on Windows 7 64 bit and requests 1.2.3.
Anyone can give some suggestions?
Thanks.
This originates deep within the bowels of http.client
, which is used by urllib3
which is used by requests
.
http.client
actually checks if there is already an accept-encoding
in the headers dictionary passed, and if there is it skips adding the identity
header – the only problem is that what is passed as headers dictionary is something like this:
CaseInsensitiveDict({b'Accept-Encoding': 'gzip, deflate, compress', ...})
So why is it not working? requests
encodes the header names, and as in python3 a str
object compared to a bytes
object always is False
, the check performed in http.client
fails…
If you really want to get rid of the additional header, the quickest way would be to either comment out line 340 in requests/models.py, or monkeypatch requests.models.PreparedRequest.prepare_headers
edit:
this seems to be fixed in the (not yet released) 2.0 branch of requests
Thanks to @mata‘s answer, I’ve been able monkey patched HTTPConnection.putheader
to ignore Accept-Encoding: identity
in my particular case:
from http.client import HTTPConnection
def drop_accept_encoding_on_putheader(http_connection_putheader):
def wrapper(self, header, *values):
if header == "Accept-Encoding" and "identity" in values:
return
return http_connection_putheader(self, header, *values)
return wrapper
HTTPConnection.putheader = drop_accept_encoding_on_putheader(HTTPConnection.putheader)
s = requests.Session()
s.headers.clear()
r = s.post("https://httpbin.org/post")
print(r.json()["headers"])
Result:
{'Content-Length': '0',
'Host': 'httpbin.org',
'User-Agent': 'python-urllib3/1.26.8',
'X-Amzn-Trace-Id': 'Root=1-639e06af-32dea6906aff32526a081e8e'}