Python: How to get the Content-Type of an URL?
Question:
I need to get the content-type of an internet(intranet) resource not a local file. How can I get the MIME type from a resource behind an URL:
I tried this:
res = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry")
http_message = res.info()
message = http_message.getplist()
I get:
['charset=UTF-8']
How can I get the Content-Type
, can be done using urllib
and how or if not what is the other way?
Answers:
res = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry" )
http_message = res.info()
full = http_message.type # 'text/plain'
main = http_message.maintype # 'text'
A Python3 solution to this:
import urllib.request
with urllib.request.urlopen('http://www.google.com') as response:
info = response.info()
print(info.get_content_type()) # -> text/html
print(info.get_content_maintype()) # -> text
print(info.get_content_subtype()) # -> html
Update: since info() function is deprecated in Python 3.9, you can read about the preferred type called headers here
import urllib
r = urllib.request.urlopen(url)
header = r.headers # type is email.message.EmailMessage
contentType = header.get_content_type() # or header.get('content-type')
contentLength = header.get('content-length')
filename = header.get_filename()
also, a good way to quickly get the mimetype without actually loading the url
import mimetypes
contentType, encoding = mimetypes.guess_type(url)
The second method does not guarantee an answer but is a quick and dirty trick since it’s just looking at the URL string rather than actually opening the URL.
I need to get the content-type of an internet(intranet) resource not a local file. How can I get the MIME type from a resource behind an URL:
I tried this:
res = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry")
http_message = res.info()
message = http_message.getplist()
I get:
['charset=UTF-8']
How can I get the Content-Type
, can be done using urllib
and how or if not what is the other way?
res = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry" )
http_message = res.info()
full = http_message.type # 'text/plain'
main = http_message.maintype # 'text'
A Python3 solution to this:
import urllib.request
with urllib.request.urlopen('http://www.google.com') as response:
info = response.info()
print(info.get_content_type()) # -> text/html
print(info.get_content_maintype()) # -> text
print(info.get_content_subtype()) # -> html
Update: since info() function is deprecated in Python 3.9, you can read about the preferred type called headers here
import urllib
r = urllib.request.urlopen(url)
header = r.headers # type is email.message.EmailMessage
contentType = header.get_content_type() # or header.get('content-type')
contentLength = header.get('content-length')
filename = header.get_filename()
also, a good way to quickly get the mimetype without actually loading the url
import mimetypes
contentType, encoding = mimetypes.guess_type(url)
The second method does not guarantee an answer but is a quick and dirty trick since it’s just looking at the URL string rather than actually opening the URL.