Python requests library added an additional header "Accept-Encoding: identity"

Question:

This is my code.

import requests
from sys import exit
proxies = {
    "http": "127.0.0.1:8888",
    "https": "127.0.0.1:8888",
}

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0",
    "Accept-Encoding": "gzip, deflate",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Connection": "keep-alive"
}


login_page = "http://www.test.com/login/"
r = requests.get(login_page, proxies = proxies, headers = headers)
original_cookies = r.cookies
exit(0)

This is what I got from fiddler2. As you can see, it added an additional header Accept-Encoding: identity.

GET http://www.test.com/login/ HTTP/1.1
Accept-Encoding: identity
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Host: www.test.com
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0

I’m using Python 3.3.2 on Windows 7 64 bit and requests 1.2.3.

Anyone can give some suggestions?

Thanks.

Asked By: Just a learner

||

Answers:

This originates deep within the bowels of http.client, which is used by urllib3 which is used by requests.

http.client actually checks if there is already an accept-encoding in the headers dictionary passed, and if there is it skips adding the identity header – the only problem is that what is passed as headers dictionary is something like this:

CaseInsensitiveDict({b'Accept-Encoding': 'gzip, deflate, compress', ...})

So why is it not working? requests encodes the header names, and as in python3 a str object compared to a bytes object always is False, the check performed in http.client fails…

If you really want to get rid of the additional header, the quickest way would be to either comment out line 340 in requests/models.py, or monkeypatch requests.models.PreparedRequest.prepare_headers

edit:
this seems to be fixed in the (not yet released) 2.0 branch of requests

Answered By: mata

Thanks to @mata‘s answer, I’ve been able monkey patched HTTPConnection.putheader to ignore Accept-Encoding: identity in my particular case:

from http.client import HTTPConnection

def drop_accept_encoding_on_putheader(http_connection_putheader):
    def wrapper(self, header, *values):
        if header == "Accept-Encoding" and "identity" in values:
            return
        return http_connection_putheader(self, header, *values)

    return wrapper

HTTPConnection.putheader = drop_accept_encoding_on_putheader(HTTPConnection.putheader)

s = requests.Session()
s.headers.clear()
r = s.post("https://httpbin.org/post")
print(r.json()["headers"])

Result:

{'Content-Length': '0',
 'Host': 'httpbin.org',
 'User-Agent': 'python-urllib3/1.26.8',
 'X-Amzn-Trace-Id': 'Root=1-639e06af-32dea6906aff32526a081e8e'}
Answered By: Gleb Ignatev