python 3 urllib and http.client – unable to turn on debug messages

Question:

Hi Stackoverflow community,

I’m trying to get familiar with the urllib.request standard library and use it in my scripts at work instead of wget.
I’m however unable to get the detailed HTTP messages displayed neither in IDLE nor using script file or manually typing the commandy into cmd (py).

I’m using Python on Windows 7 x64, and tried 3.5 and 3.6 including 3.6.1rc1 without success.

The messages are supposedly turned on using this command:

http.client.HTTPConnection.debuglevel = 1

so here is my sample code. It works but no details are displayed:

import http.client
import urllib.request
http.client.HTTPConnection.debuglevel = 1
response = urllib.request.urlopen('http://stackoverflow.com')
content = response.read()
with open("stack.html", "wb") as file:
    file.write(content)

I have tried using .set_debuglevel(1) without success.
There seem to be years old questions here
Turning on debug output for python 3 urllib
However this is the same as I have and it’s not working. Also in this question’s comment user Yen Chi Hsuan says it’s a bug and reported it here
https://bugs.python.org/issue26892

The bug was closed in June 2016 so I would expect this is corrected in recent Python versions.

Maybe I’m missing something (e.g. something else needs to be enabled / installed etc..) but I spent some time on this and reached a dead end.

Is there a working way to have the http detailed messages displayed with urllib on Python 3 on Windows?

Thank you

EDIT: the response suggested by pvg works on the simple example but I cannot make it to work in a case where login needed. The HTTPBasicAuthHandler does not have this debuglevel attribute. And when I try combining multiple handlers into the opener it does not work either.

userName = 'mylogin'
passWord  = 'mypassword'
top_level_url = 'http://page-to-login.com'

# create an authorization handler
passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, top_level_url, userName, passWord);

auth_handler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)

result = opener.open(top_level_url)
content = result.read()
Asked By: alleby

||

Answers:

The example in the issue you linked shows the working code, a version reproduced below:

import urllib.request

handler = urllib.request.HTTPHandler(debuglevel=10)
opener = urllib.request.build_opener(handler)
content = opener.open('http://stackoverflow.com').read()

print(content[0:120])

This is pretty clunky, another option is to use a friendlier library like urllib3 (http://urllib3.readthedocs.io/en/latest/).

import urllib3

urllib3.add_stderr_logger()
http = urllib3.PoolManager()
r = http.request('GET', 'http://stackoverflow.com')
print(r.status)

If you decide to use the requests library instead, the following answer describes how to set up logging:

How can I see the entire HTTP request that's being sent by my Python application?

Answered By: pvg

Ever since Python version 3.5.2 (release ~June 2016) the http.client.HTTPConnection.debuglevel is entirely ignored in favor of the debuglevel constructor argument for urllib.request.HTTPHandler.

This is due to this change that sets the value of http.client.HTTPConnection.debuglevel to whatever is set in urllib.request.HTTPHandler‘s constructor argument debuglevel, on this line.

A PR has been opened to fix this, but in the mean time you can either use the constructor argument for HTTPHandler and HTTPSHandler (as pvg’s answer points out), or you can monkey patch the __init__ methods of HTTPHandler and HTTPSHandler to respect the global values like so:

https_old_init = urllib.request.HTTPSHandler.__init__

def https_new_init(self, debuglevel=None, context=None, check_hostname=None):
    debuglevel = debuglevel if debuglevel is not None else http.client.HTTPSConnection.debuglevel
    https_old_init(self, debuglevel, context, check_hostname)

urllib.request.HTTPSHandler.__init__ = https_new_init

http_old_init = urllib.request.HTTPHandler.__init__

def http_new_init(self, debuglevel=None):
    debuglevel = debuglevel if debuglevel is not None else http.client.HTTPSConnection.debuglevel
    http_old_init(self, debuglevel)

urllib.request.HTTPHandler.__init__ = http_new_init

Note: I don’t recommend setting the debuglevel in HTTPHandler‘s as a method argument default value because the default values for method arguments get evaluated at function definition evaluation time, which, for HTTPHandler‘s constructor, is when the module urllib.request is imported.

Answered By: wheeler