How to check whether a URL is downloadable or not?

Question:

How do I check whether the given url is downloadable or not using Python?

It should return True if it is downloadable else False

An example of a non-downloadable url: www.google.com

Note: I am not speaking about contents of the URL and saving it as a web page.

What is a downloadable URL?

If you redirect to a URL and if a file starts to download, then it is a downloadable url

Example: https://drive.google.com/uc?id=1QOmVDpd8hcVYqqUXDXf68UMDWQZP0wQV&export=download

Note: It downloads the stackoverflow annual survey 2019 data set.

Asked By: NewbieProgrammer

||

Answers:

On HTTP protocol level itself, there is no distinction between downloadable and non-downloadable URL. There is an HTTP request and there is a subsequent response. Response body can be a binary file, HTML, image etc..

You can just request the HTTP response header and look for Content-Type: and decide whether you want to consider that content-type as downloadable or non-downloadable.

Answered By: Tejas Sarade

This can be done in using the popular requests library

import requests
url = 'https://www.google.com'
headers=requests.head(url).headers
downloadable = 'attachment' in headers.get('Content-Disposition', '')

Content Disposition Header reference

Answered By: Abhishek J

So I tried searching for a better way, the site link which I was checking was a bit tricky
most stackoverflow answers mentioned about using head request to get response header, but the site I was checking returned 404 error.When I use get request the whole file is downloaded before outputing the header.My friend suggested me a solution of using the parameter stream=True and that really got worked.

import requests 
r = requests.get(link, stream=True)
print(r.headers)
Answered By: Alen Paul Varghese

Downloadable files must have Content-Length in headers :

import requests
r = requests.get(url, stream=True)

try:
    print(r.headers['content-length'])
except:
    print("Not Downloadable")
Answered By: 14 14