How i identify a list of website that doesn't exist, from a bunch of website?

Question:

Goodafternoon, for a university python project I need to estract a table from a website, but the link doesn’t exist, so i need that my cycle ignore that link, and move to the next link. how can I do that?

i’m using the python language to create a dataset of soundtrack.
I used BeautifulSoup to extract the .html, but the link docent exist, so i think about putting a

if type(link)=="NoneType":

but it doesn’t work. link is the result of soup.find that gave me as a result nothing, infant type(link) give me as a result NoneType.
what can i do to recognise the inexistent link?
thank you for the help

Asked By: Lukesky

||

Answers:

You can create a function to test if the URL is valid. If it generates an error, then it will return False, however if is creates a successful connection, it will return True. You can then use this function to filter your list to produce a new list of valid URLS.

Here is an example:

Code:

import requests

url_list = ["http://yahoo.com", "http://a_random_site_that_does_not_exist.com", "http://google.com"]

def is_valid_url(url):
    try:
        response = requests.get(url)
        response.raise_for_status()
        return True
    except requests.exceptions.RequestException:
        return False

valid_url_list = list(filter(is_valid_url, url_list))
print(valid_url_list)

Output:

['http://yahoo.com', 'http://google.com']
Answered By: ScottC
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.