timeout for urllib2.urlopen() in pre Python 2.6 versions

Question:

The urllib2 documentation says that timeout parameter was added in Python 2.6. Unfortunately my code base has been running on Python 2.5 and 2.4 platforms.

Is there any alternate way to simulate the timeout? All I want to do is allow the code to talk the remote server for a fixed amount of time.

Perhaps any alternative built-in library? (Don’t want install 3rd party, like pycurl)

Asked By: rubayeet

||

Answers:

I think your best choice is to patch (or deploy an local version of) your urllib2 with the change from the 2.6 maintenance branch

The file should be in /usr/lib/python2.4/urllib2.py (on linux and 2.4)

Answered By: Kimvais

I use httplib from the standard library. It has a dead simple API, but only handles http as you might guess. IIUC urllib uses httplib to implement the http stuff.

Answered By: Kris Walker

you can set a global timeout for all socket operations (including HTTP requests) by using:

socket.setdefaulttimeout()

like this:

import urllib2
import socket
socket.setdefaulttimeout(30)
f = urllib2.urlopen('http://www.python.org/')

in this case, your urllib2 request would timeout after 30 secs and throw a socket exception. (this was added in Python 2.3)

Answered By: Corey Goldberg

Well, the way timeout is handled in either 2.4 or 2.6 is the same. If you open the urllib2.py file in 2.6 u would see that it takes an extra argument as timeout and handles it using the socket.defaulttimeout() method as mentioned is answer 1.

So you really need not update your urllib2.py in that case.

Answered By: Konark Modi

With considerable irritation, you can override the httplib.HTTPConnection class that the urllib2.HTTPHandler uses.

def urlopen_with_timeout(url, data=None, timeout=None):

  # Create these two helper classes fresh each time, since
  # timeout needs to be in the closure.
  class TimeoutHTTPConnection(httplib.HTTPConnection):
    def connect(self):
      """Connect to the host and port specified in __init__."""
      msg = "getaddrinfo returns an empty list"
      for res in socket.getaddrinfo(self.host, self.port, 0,
                      socket.SOCK_STREAM): 
        af, socktype, proto, canonname, sa = res
        try:
          self.sock = socket.socket(af, socktype, proto)
          if timeout is not None:
            self.sock.settimeout(timeout)
          if self.debuglevel > 0:
            print "connect: (%s, %s)" % (self.host, self.port)
          self.sock.connect(sa)
        except socket.error, msg:
          if self.debuglevel > 0:
            print 'connect fail:', (self.host, self.port)
          if self.sock:
            self.sock.close()
          self.sock = None
          continue
        break
      if not self.sock:
        raise socket.error, msg

  class TimeoutHTTPHandler(urllib2.HTTPHandler):
    http_request = urllib2.AbstractHTTPHandler.do_request_
    def http_open(self, req):
      return self.do_open(TimeoutHTTPConnection, req)

  opener = urllib2.build_opener(TimeoutHTTPHandler)
  opener.open(url, data)
Answered By: Philip Z

You must set timeout in two places.

import urllib2
import socket

socket.setdefaulttimeout(30)
f = urllib2.urlopen('http://www.python.org/', timeout=30)
Answered By: Daniel Magnusson
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.