What is the quickest way to HTTP GET in Python?
Question:
What is the quickest way to HTTP GET in Python if I know the content will be a string? I am searching the documentation for a quick one-liner like:
contents = url.get("http://example.com/foo/bar")
But all I can find using Google are httplib
and urllib
– and I am unable to find a shortcut in those libraries.
Does standard Python 2.5 have a shortcut in some form as above, or should I write a function url_get
?
- I would prefer not to capture the output of shelling out to
wget
or curl
.
Answers:
Python 3:
import urllib.request
contents = urllib.request.urlopen("http://example.com/foo/bar").read()
Python 2:
import urllib2
contents = urllib2.urlopen("http://example.com/foo/bar").read()
Documentation for urllib.request
and read
.
Have a look at httplib2, which – next to a lot of very useful features – provides exactly what you want.
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
Where content would be the response body (as a string), and resp would contain the status and response headers.
It doesn’t come included with a standard python install though (but it only requires standard python), but it’s definitely worth checking out.
If you want solution with httplib2 to be oneliner consider instantiating anonymous Http object
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
Here is a wget script in Python:
# From python cookbook, 2nd edition, page 487
import sys, urllib
def reporthook(a, b, c):
print "% 3.1f%% of %d bytesr" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print
theller’s solution for wget is really useful, however, i found it does not print out the progress throughout the downloading process. It’s perfect if you add one line after the print statement in reporthook.
import sys, urllib
def reporthook(a, b, c):
print "% 3.1f%% of %d bytesr" % (min(100, float(a * b) / c * 100), c),
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print
Use the Requests library:
import requests
r = requests.get("http://example.com/foo/bar")
Then you can do stuff like this:
>>> print(r.status_code)
>>> print(r.headers)
>>> print(r.content) # bytes
>>> print(r.text) # r.content as str
Install Requests by running this command:
pip install requests
If you are working with HTTP APIs specifically, there are also more convenient choices such as Nap.
For example, here’s how to get gists from Github since May 1st 2014:
from nap.url import Url
api = Url('https://api.github.com')
gists = api.join('gists')
response = gists.get(params={'since': '2014-05-01T00:00:00Z'})
print(response.json())
More examples: https://github.com/kimmobrunfeldt/nap#examples
Excellent solutions Xuan, Theller.
For it to work with python 3 make the following changes
import sys, urllib.request
def reporthook(a, b, c):
print ("% 3.1f%% of %d bytesr" % (min(100, float(a * b) / c * 100), c))
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print (url, "->", file)
urllib.request.urlretrieve(url, file, reporthook)
print
Also, the URL you enter should be preceded by a “http://”, otherwise it returns a unknown url type error.
Without further necessary imports this solution works (for me) – also with https:
try:
import urllib2 as urlreq # Python 2.x
except:
import urllib.request as urlreq # Python 3.x
req = urlreq.Request("http://example.com/foo/bar")
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
urlreq.urlopen(req).read()
I often have difficulty grabbing the content when not specifying a “User-Agent” in the header information. Then usually the requests are cancelled with something like: urllib2.HTTPError: HTTP Error 403: Forbidden
or urllib.error.HTTPError: HTTP Error 403: Forbidden
.
How to also send headers
Python 3:
import urllib.request
contents = urllib.request.urlopen(urllib.request.Request(
"https://api.github.com/repos/cirosantilli/linux-kernel-module-cheat/releases/latest",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)
Python 2:
import urllib2
contents = urllib2.urlopen(urllib2.Request(
"https://api.github.com",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)
It’s simple enough with the powerful urllib3
library.
Import it like this:
import urllib3
http = urllib3.PoolManager()
And make a request like this:
response = http.request('GET', 'https://example.com')
print(response.data) # Raw data.
print(response.data.decode('utf-8')) # Text.
print(response.status) # Status code.
print(response.headers['Content-Type']) # Content type.
You can add headers too:
response = http.request('GET', 'https://example.com', headers={
'key1': 'value1',
'key2': 'value2'
})
More info can be found on the urllib3 documentation.
urllib3
is much safer and easier to use than the builtin urllib.request
or http
modules and is stable.
Actually in Python we can read from HTTP responses like from files, here is an example for reading JSON from an API.
import json
from urllib.request import urlopen
with urlopen(url) as f:
resp = json.load(f)
return resp['some_key']
For python >= 3.6
, you can use dload:
import dload
t = dload.text(url)
For json
:
j = dload.json(url)
Install:
pip install dload
If you want a lower level API:
import http.client
conn = http.client.HTTPSConnection('example.com')
conn.request('GET', '/')
resp = conn.getresponse()
content = resp.read()
conn.close()
text = content.decode('utf-8')
print(text)
What is the quickest way to HTTP GET in Python if I know the content will be a string? I am searching the documentation for a quick one-liner like:
contents = url.get("http://example.com/foo/bar")
But all I can find using Google are httplib
and urllib
– and I am unable to find a shortcut in those libraries.
Does standard Python 2.5 have a shortcut in some form as above, or should I write a function url_get
?
- I would prefer not to capture the output of shelling out to
wget
orcurl
.
Python 3:
import urllib.request
contents = urllib.request.urlopen("http://example.com/foo/bar").read()
Python 2:
import urllib2
contents = urllib2.urlopen("http://example.com/foo/bar").read()
Documentation for urllib.request
and read
.
Have a look at httplib2, which – next to a lot of very useful features – provides exactly what you want.
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
Where content would be the response body (as a string), and resp would contain the status and response headers.
It doesn’t come included with a standard python install though (but it only requires standard python), but it’s definitely worth checking out.
If you want solution with httplib2 to be oneliner consider instantiating anonymous Http object
import httplib2
resp, content = httplib2.Http().request("http://example.com/foo/bar")
Here is a wget script in Python:
# From python cookbook, 2nd edition, page 487
import sys, urllib
def reporthook(a, b, c):
print "% 3.1f%% of %d bytesr" % (min(100, float(a * b) / c * 100), c),
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print
theller’s solution for wget is really useful, however, i found it does not print out the progress throughout the downloading process. It’s perfect if you add one line after the print statement in reporthook.
import sys, urllib
def reporthook(a, b, c):
print "% 3.1f%% of %d bytesr" % (min(100, float(a * b) / c * 100), c),
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print url, "->", file
urllib.urlretrieve(url, file, reporthook)
print
Use the Requests library:
import requests
r = requests.get("http://example.com/foo/bar")
Then you can do stuff like this:
>>> print(r.status_code)
>>> print(r.headers)
>>> print(r.content) # bytes
>>> print(r.text) # r.content as str
Install Requests by running this command:
pip install requests
If you are working with HTTP APIs specifically, there are also more convenient choices such as Nap.
For example, here’s how to get gists from Github since May 1st 2014:
from nap.url import Url
api = Url('https://api.github.com')
gists = api.join('gists')
response = gists.get(params={'since': '2014-05-01T00:00:00Z'})
print(response.json())
More examples: https://github.com/kimmobrunfeldt/nap#examples
Excellent solutions Xuan, Theller.
For it to work with python 3 make the following changes
import sys, urllib.request
def reporthook(a, b, c):
print ("% 3.1f%% of %d bytesr" % (min(100, float(a * b) / c * 100), c))
sys.stdout.flush()
for url in sys.argv[1:]:
i = url.rfind("/")
file = url[i+1:]
print (url, "->", file)
urllib.request.urlretrieve(url, file, reporthook)
print
Also, the URL you enter should be preceded by a “http://”, otherwise it returns a unknown url type error.
Without further necessary imports this solution works (for me) – also with https:
try:
import urllib2 as urlreq # Python 2.x
except:
import urllib.request as urlreq # Python 3.x
req = urlreq.Request("http://example.com/foo/bar")
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36')
urlreq.urlopen(req).read()
I often have difficulty grabbing the content when not specifying a “User-Agent” in the header information. Then usually the requests are cancelled with something like: urllib2.HTTPError: HTTP Error 403: Forbidden
or urllib.error.HTTPError: HTTP Error 403: Forbidden
.
How to also send headers
Python 3:
import urllib.request
contents = urllib.request.urlopen(urllib.request.Request(
"https://api.github.com/repos/cirosantilli/linux-kernel-module-cheat/releases/latest",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)
Python 2:
import urllib2
contents = urllib2.urlopen(urllib2.Request(
"https://api.github.com",
headers={"Accept" : 'application/vnd.github.full+json"text/html'}
)).read()
print(contents)
It’s simple enough with the powerful urllib3
library.
Import it like this:
import urllib3
http = urllib3.PoolManager()
And make a request like this:
response = http.request('GET', 'https://example.com')
print(response.data) # Raw data.
print(response.data.decode('utf-8')) # Text.
print(response.status) # Status code.
print(response.headers['Content-Type']) # Content type.
You can add headers too:
response = http.request('GET', 'https://example.com', headers={
'key1': 'value1',
'key2': 'value2'
})
More info can be found on the urllib3 documentation.
urllib3
is much safer and easier to use than the builtin urllib.request
or http
modules and is stable.
Actually in Python we can read from HTTP responses like from files, here is an example for reading JSON from an API.
import json
from urllib.request import urlopen
with urlopen(url) as f:
resp = json.load(f)
return resp['some_key']
For python >= 3.6
, you can use dload:
import dload
t = dload.text(url)
For json
:
j = dload.json(url)
Install:
pip install dload
If you want a lower level API:
import http.client
conn = http.client.HTTPSConnection('example.com')
conn.request('GET', '/')
resp = conn.getresponse()
content = resp.read()
conn.close()
text = content.decode('utf-8')
print(text)