Python follow redirects and then download the page?


I have the following python script and it works beautifully.

import urllib2

url = '' # write the url here

usock = urllib2.urlopen(url)
data =

print data

however, some of the URL’s I give it may redirect it 2 or more times. How can I have python wait for redirects to complete before loading the data.
For instance when using the above code with

which is the equvilant of hitting the im lucky button on a google search, I get:

>>> url = ''
>>> usick = urllib2.urlopen(url)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 126, in urlopen
    return, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 400, in open
    response = meth(req, response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 438, in error
    return self._call_chain(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 372, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 521, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

Ive tried the (url, data, timeout) however, I am unsure what to put there.

I actually found out if I dont redirect and just used the header of the first link, I can grab the location of the next redirect and use that as my final link

Asked By: Cripto



You might be better off with Requests library which has better APIs for controlling redirect handling:

Requests: (urllib replacement for humans)

Answered By: Mikko Ohtamaa

Use requests as the other answer states, here is an example. The redirect will be in r.url. In the example below the http is redirected to https


In [1]: import requests
   ...: r = requests.head('', allow_redirects=True)
   ...: r.url

Out[1]: ''

For GET:

In [1]: import requests
   ...: r = requests.get('')
   ...: r.url

Out[1]: ''

Note for HEAD you have to specify allow_redirects, if you don’t you can get it in the headers but this is not advised.

In [1]: import requests

In [2]: r = requests.head('')

In [3]: r.headers.get('location')
Out[3]: ''

To download the page you will need GET, you can then access the page using r.content

Answered By: Glen Thompson
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.