In python, how would I check if a url ending in .jpg exists?
I think you can try send a http request to the url and read the response.If no exception was caught,it probably exists.
http://www.fakedomain.com/fakeImage.jpg automatically redirected to
http://www.fakedomain.com/index.html without any error.
Redirecting for 301 and 302 responses are automatically done without giving any response back to user.
Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.
Here is the one sample from Dive Into Python:
>>> import httplib >>> >>> def exists(site, path): ... conn = httplib.HTTPConnection(site) ... conn.request('HEAD', path) ... response = conn.getresponse() ... conn.close() ... return response.status == 200 ... >>> exists('http://www.fakedomain.com', '/fakeImage.jpg') False
If the status is anything other than a 200, the resource doesn’t exist at the URL. This doesn’t mean that it’s gone altogether. If the server returns a 301 or 302, this means that the resource still exists, but at a different URL. To alter the function to handle this case, the status check line just needs to be changed to
return response.status in (200, 301, 302).
Try it with mechanize:
import mechanize br = mechanize.Browser() br.set_handle_redirect(False) try: br.open_novisit('http://www.fakedomain.com/fakeImage.jpg') print 'OK' except: print 'KO'
thanks for all the responses everyone, ended up using the following:
try: f = urllib2.urlopen(urllib2.Request(url)) deadLinkFound = False except: deadLinkFound = True
There are problems with the previous answers when the file is in ftp server (ftp://url.com/file), the following code works when the file is in ftp, http or https:
import urllib2 def file_exists(url): request = urllib2.Request(url) request.get_method = lambda : 'HEAD' try: response = urllib2.urlopen(request) return True except: return False
import requests def exists(path): r = requests.head(path) return r.status_code == requests.codes.ok print exists('http://www.fakedomain.com/fakeImage.jpg')
200, so you can substitute the exact status code if you wish.
requests.head may throw an exception if server doesn’t respond, so you might want to add a try-except construct.
Also if you want to include codes
302, consider code
303 too, especially if you dereference URIs that denote resources in Linked Data. A URI may represent a person, but you can’t download a person, so the server will redirect you to a page that describes this person using 303 redirect.
This might be good enough to see if a url to a file exists.
import urllib if urllib.urlopen('http://www.fakedomain.com/fakeImage.jpg').code == 200: print 'File exists'
in Python 3.6.5:
import http.client def exists(site, path): connection = http.client.HTTPConnection(site) connection.request('HEAD', path) response = connection.getresponse() connection.close() return response.status == 200 exists("www.fakedomain.com", "/fakeImage.jpg")
In Python 3, the module
httplib has been renamed to
And you need remove the
https:// from your URL, because the
httplib is considering
: as a port number and the port number must be numeric.
import requests def url_exists(url): """Check if resource exist?""" if not url: raise ValueError("url is required") try: resp = requests.head(url) return True if resp.status_code == 200 else False except Exception as e: return False
The answer of @z3moon was good, but I think it is for py 2.x. For python 3.x, you may want to add
request to the module call.
import urllib def check_valid_URLs(url) -> bool: try: if urllib.request.urlopen(url).code == 200: return True else: return False except: return False