How to define multiple error handling statements?
Question:
I would like to read a set of csv
files from URL as dataframes. These files contain a date in their name like YYYYMMDD.csv
in their names. I need to iterate over a set of predefined dates and read the corresponding file into a Python.
Sometimes the file does not exist and an error as follows is thrown:
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
What I would do in this situation is to add one day to the date like turning 2020-05-01
to 2020-05-02
and in case of throwing the aforementioned error I would add 2 days to the date or at most 3 days until there is a url available without an error.
I would like to know how I can write it in a program maybe with nested try
– except
where if adding 1 day to the date leads to a URL without error the subsequent steps are not executed.
As I don’t have the data I will use the following URL as an example:
import pandas as pd
import requests
url = 'http://winterolympicsmedals.com/medals.csv'
s = requests.get(url).content
c = pd.read_csv(s)
Here the file being read is medals.csv
. If you try madels.csv
or modals.csv
you will get the error I am talking about. So I need to know how I can control the errors in 3 steps by replacing the file name until I get the desired dataframe like first we try madels.csv
resulting in an error, then models.csv
also resulting in an error and after that medals.csv
which result in the desired output.
My problem is that sometimes the modification I made to the file also fails in except
so I need to know how I can accommodate a second modification.
Answers:
here’s a simple function that given that URL will do exactly what you ask. Be aware that a slight change in the input url can lead to a few errors so make sure the date format is exactly what you mentioned. In any case:
Notes: URL Parsing
import pandas as pd
import datetime as dt
def url_next_day(url):
# if the full url is passed I suggest you would use urllib parse but
# urllib parse but here's a workaround
filename = url.rstrip("/").split("/")[-1]
date=dt.datetime.strptime(filename.strip(".csv"),"%Y%m%d").date()
date_plus_one_day= date + dt.timedelta(days=1)
new_file_name= dt.datetime.strftime(date_plus_one_day,"%Y%m%d")+".csv"
url_next_day=url.replace(filename,new_file_name)
return url_next_day
for url in list_of_urls:
try:
s = requests.get(url).content
c = pd.read_csv(s)
except Exception as e:
print(f"Invalid URL: {url}. The error: {e}. Trying the days after...")
for _ in range(3): #because you want at most 3 days after
try:
url=url_next_day(url)
s = requests.get(url).content
c = pd.read_csv(s)
break
except Exception:
pass
else:
print("No file available in the days after. Moving on")
Happy Coding!
OK, I have enough changes I want to recommend on top of @Daniel Gonçalves’s initial solution that I’m going to post them as a second answer.
1- The loop trying additional days needs to break when it got a hit, so it doesn’t keep going.
2- That loop needs an else:
block to handle the complete failure case.
3- It is best practice to catch only the exception you mean to catch and know how to handle. Here a urllib.error.HTTPError
means a failure to fetch the page, but an other exception would mean something else is wrong with the program, and it would be best not to catch that, so you would notice it and fix your program when that happens.
The result:
for url in list_of_urls:
try:
s = requests.get(url).content
c = pd.read_csv(s)
except urllib.error.HTTPError as e:
print(f"Invalid URL: {url}. The error: {e}. Trying the days after...")
for _ in range(3): #because you want at most 3 days after
try:
url = url_next_day(url)
s = requests.get(url).content
c = pd.read_csv(s)
break
except urllib.error.HTTPError:
print(f"Also failed to fetch {url}...")
else:
# this block is only executed if the loop never breaks
print("No file available in the days after. Moving on.")
c = None # or an empty data frame, or whatever won't crash the rest of your code
No need to do any nested try-except
blocks, all you need is one try-except
and a for
loop.
First, function that tries to read a file (returns content of the file, or None
if the file is not found):
def read_file(fp):
try:
with open(fp, 'r') as f:
text = f.read()
return text
except Exception as e:
print(e)
return None
Then, function that tries to find a file from a predefined date (example input would be '20220514'
). The functions tries to read content of the file with the given date, or dates up to 3 days after it:
from datetime import datetime, timedelta
def read_from_predefined_date(date):
date_format = '%Y%m%d'
date = datetime.strptime(date, date_format)
result = None
for i in range(4):
date_to_read = date + timedelta(days=i)
date_as_string = date_to_read.strftime(date_format)
fp = f'data/{date_as_string}.csv'
result = read_file(fp)
if result:
break
return result
To test, e.g. create a data/20220515.csv
and run following code:
d = '20220514'
result = read_from_predefined_date(d)
I would like to read a set of csv
files from URL as dataframes. These files contain a date in their name like YYYYMMDD.csv
in their names. I need to iterate over a set of predefined dates and read the corresponding file into a Python.
Sometimes the file does not exist and an error as follows is thrown:
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
What I would do in this situation is to add one day to the date like turning 2020-05-01
to 2020-05-02
and in case of throwing the aforementioned error I would add 2 days to the date or at most 3 days until there is a url available without an error.
I would like to know how I can write it in a program maybe with nested try
– except
where if adding 1 day to the date leads to a URL without error the subsequent steps are not executed.
As I don’t have the data I will use the following URL as an example:
import pandas as pd
import requests
url = 'http://winterolympicsmedals.com/medals.csv'
s = requests.get(url).content
c = pd.read_csv(s)
Here the file being read is medals.csv
. If you try madels.csv
or modals.csv
you will get the error I am talking about. So I need to know how I can control the errors in 3 steps by replacing the file name until I get the desired dataframe like first we try madels.csv
resulting in an error, then models.csv
also resulting in an error and after that medals.csv
which result in the desired output.
My problem is that sometimes the modification I made to the file also fails in except
so I need to know how I can accommodate a second modification.
here’s a simple function that given that URL will do exactly what you ask. Be aware that a slight change in the input url can lead to a few errors so make sure the date format is exactly what you mentioned. In any case:
Notes: URL Parsing
import pandas as pd
import datetime as dt
def url_next_day(url):
# if the full url is passed I suggest you would use urllib parse but
# urllib parse but here's a workaround
filename = url.rstrip("/").split("/")[-1]
date=dt.datetime.strptime(filename.strip(".csv"),"%Y%m%d").date()
date_plus_one_day= date + dt.timedelta(days=1)
new_file_name= dt.datetime.strftime(date_plus_one_day,"%Y%m%d")+".csv"
url_next_day=url.replace(filename,new_file_name)
return url_next_day
for url in list_of_urls:
try:
s = requests.get(url).content
c = pd.read_csv(s)
except Exception as e:
print(f"Invalid URL: {url}. The error: {e}. Trying the days after...")
for _ in range(3): #because you want at most 3 days after
try:
url=url_next_day(url)
s = requests.get(url).content
c = pd.read_csv(s)
break
except Exception:
pass
else:
print("No file available in the days after. Moving on")
Happy Coding!
OK, I have enough changes I want to recommend on top of @Daniel Gonçalves’s initial solution that I’m going to post them as a second answer.
1- The loop trying additional days needs to break when it got a hit, so it doesn’t keep going.
2- That loop needs an else:
block to handle the complete failure case.
3- It is best practice to catch only the exception you mean to catch and know how to handle. Here a urllib.error.HTTPError
means a failure to fetch the page, but an other exception would mean something else is wrong with the program, and it would be best not to catch that, so you would notice it and fix your program when that happens.
The result:
for url in list_of_urls:
try:
s = requests.get(url).content
c = pd.read_csv(s)
except urllib.error.HTTPError as e:
print(f"Invalid URL: {url}. The error: {e}. Trying the days after...")
for _ in range(3): #because you want at most 3 days after
try:
url = url_next_day(url)
s = requests.get(url).content
c = pd.read_csv(s)
break
except urllib.error.HTTPError:
print(f"Also failed to fetch {url}...")
else:
# this block is only executed if the loop never breaks
print("No file available in the days after. Moving on.")
c = None # or an empty data frame, or whatever won't crash the rest of your code
No need to do any nested try-except
blocks, all you need is one try-except
and a for
loop.
First, function that tries to read a file (returns content of the file, or None
if the file is not found):
def read_file(fp):
try:
with open(fp, 'r') as f:
text = f.read()
return text
except Exception as e:
print(e)
return None
Then, function that tries to find a file from a predefined date (example input would be '20220514'
). The functions tries to read content of the file with the given date, or dates up to 3 days after it:
from datetime import datetime, timedelta
def read_from_predefined_date(date):
date_format = '%Y%m%d'
date = datetime.strptime(date, date_format)
result = None
for i in range(4):
date_to_read = date + timedelta(days=i)
date_as_string = date_to_read.strftime(date_format)
fp = f'data/{date_as_string}.csv'
result = read_file(fp)
if result:
break
return result
To test, e.g. create a data/20220515.csv
and run following code:
d = '20220514'
result = read_from_predefined_date(d)