Python urllib2 with keep alive
Question:
How can I make a “keep alive” HTTP request using Python’s urllib2?
Answers:
Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:
>>> import urllib2
>>> from urlgrabber.keepalive import HTTPHandler
>>> keepalive_handler = HTTPHandler()
>>> opener = urllib2.build_opener(keepalive_handler)
>>> urllib2.install_opener(opener)
>>>
>>> fo = urllib2.urlopen('http://www.python.org')
Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1
There is a port of the keepalive module to Python 3.
Note that urlgrabber does not entirely work with python 2.6. I fixed the issues (I think) by making the following modifications in keepalive.py.
In keepalive.HTTPHandler.do_open() remove this
if r.status == 200 or not HANDLE_ERRORS:
return r
And insert this
if r.status == 200 or not HANDLE_ERRORS:
# [speedplane] Must return an adinfourl object
resp = urllib2.addinfourl(r, r.msg, req.get_full_url())
resp.code = r.status
resp.msg = r.reason
return resp
Unfortunately keepalive.py was removed from urlgrabber on 25 Sep 2009 by the following change after urlgrabber was changed to depend on pycurl (which supports keep-alive):
http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commit;h=f964aa8bdc52b29a2c137a917c72eecd4c4dda94
However, you can still get the last revision of keepalive.py here:
Or check out httplib‘s HTTPConnection.
Try urllib3 which has the following features:
- Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).
- File posting (encode_multipart_formdata).
- Built-in redirection and retries (optional).
- Supports gzip and deflate decoding.
- Thread-safe and sanity-safe.
- Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.
or a much more comprehensive solution – Requests – which supports keep-alive from version 0.8.0 (by using urllib3 internally) and has the following features:
- Extremely simple HEAD, GET, POST, PUT, PATCH, DELETE Requests.
- Gevent support for Asyncronous Requests.
- Sessions with cookie persistience.
- Basic, Digest, and Custom Authentication support.
- Automatic form-encoding of dictionaries
- A simple dictionary interface for request/response cookies.
- Multipart file uploads.
- Automatc decoding of Unicode, gzip, and deflate responses.
- Full support for unicode URLs and domain names.
Please avoid collective pain and use Requests instead. It will do the right thing by default and use keep-alive if applicable.
Here’s a somewhat similar urlopen() that does keep-alive, though it’s not threadsafe.
try:
from http.client import HTTPConnection, HTTPSConnection
except ImportError:
from httplib import HTTPConnection, HTTPSConnection
import select
connections = {}
def request(method, url, body=None, headers={}, **kwargs):
scheme, _, host, path = url.split('/', 3)
h = connections.get((scheme, host))
if h and select.select([h.sock], [], [], 0)[0]:
h.close()
h = None
if not h:
Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
h = connections[(scheme, host)] = Connection(host, **kwargs)
h.request(method, '/' + path, body, headers)
return h.getresponse()
def urlopen(url, data=None, *args, **kwargs):
resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
assert resp.status < 400, (resp.status, resp.reason, resp.read())
return resp
How can I make a “keep alive” HTTP request using Python’s urllib2?
Use the urlgrabber library. This includes an HTTP handler for urllib2 that supports HTTP 1.1 and keepalive:
>>> import urllib2
>>> from urlgrabber.keepalive import HTTPHandler
>>> keepalive_handler = HTTPHandler()
>>> opener = urllib2.build_opener(keepalive_handler)
>>> urllib2.install_opener(opener)
>>>
>>> fo = urllib2.urlopen('http://www.python.org')
Note: you should use urlgrabber version 3.9.0 or earlier, as the keepalive module has been removed in version 3.9.1
There is a port of the keepalive module to Python 3.
Note that urlgrabber does not entirely work with python 2.6. I fixed the issues (I think) by making the following modifications in keepalive.py.
In keepalive.HTTPHandler.do_open() remove this
if r.status == 200 or not HANDLE_ERRORS:
return r
And insert this
if r.status == 200 or not HANDLE_ERRORS:
# [speedplane] Must return an adinfourl object
resp = urllib2.addinfourl(r, r.msg, req.get_full_url())
resp.code = r.status
resp.msg = r.reason
return resp
Unfortunately keepalive.py was removed from urlgrabber on 25 Sep 2009 by the following change after urlgrabber was changed to depend on pycurl (which supports keep-alive):
http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commit;h=f964aa8bdc52b29a2c137a917c72eecd4c4dda94
However, you can still get the last revision of keepalive.py here:
Or check out httplib‘s HTTPConnection.
Try urllib3 which has the following features:
- Re-use the same socket connection for multiple requests (HTTPConnectionPool and HTTPSConnectionPool) (with optional client-side certificate verification).
- File posting (encode_multipart_formdata).
- Built-in redirection and retries (optional).
- Supports gzip and deflate decoding.
- Thread-safe and sanity-safe.
- Small and easy to understand codebase perfect for extending and building upon. For a more comprehensive solution, have a look at Requests.
or a much more comprehensive solution – Requests – which supports keep-alive from version 0.8.0 (by using urllib3 internally) and has the following features:
- Extremely simple HEAD, GET, POST, PUT, PATCH, DELETE Requests.
- Gevent support for Asyncronous Requests.
- Sessions with cookie persistience.
- Basic, Digest, and Custom Authentication support.
- Automatic form-encoding of dictionaries
- A simple dictionary interface for request/response cookies.
- Multipart file uploads.
- Automatc decoding of Unicode, gzip, and deflate responses.
- Full support for unicode URLs and domain names.
Please avoid collective pain and use Requests instead. It will do the right thing by default and use keep-alive if applicable.
Here’s a somewhat similar urlopen() that does keep-alive, though it’s not threadsafe.
try:
from http.client import HTTPConnection, HTTPSConnection
except ImportError:
from httplib import HTTPConnection, HTTPSConnection
import select
connections = {}
def request(method, url, body=None, headers={}, **kwargs):
scheme, _, host, path = url.split('/', 3)
h = connections.get((scheme, host))
if h and select.select([h.sock], [], [], 0)[0]:
h.close()
h = None
if not h:
Connection = HTTPConnection if scheme == 'http:' else HTTPSConnection
h = connections[(scheme, host)] = Connection(host, **kwargs)
h.request(method, '/' + path, body, headers)
return h.getresponse()
def urlopen(url, data=None, *args, **kwargs):
resp = request('POST' if data else 'GET', url, data, *args, **kwargs)
assert resp.status < 400, (resp.status, resp.reason, resp.read())
return resp