How to define multiple error handling statements?

Question

I would like to read a set of csv files from URL as dataframes. These files contain a date in their name like YYYYMMDD.csv in their names. I need to iterate over a set of predefined dates and read the corresponding file into a Python.
Sometimes the file does not exist and an error as follows is thrown:

raise HTTPError(req.full_url, code, msg, hdrs, fp)

urllib.error.HTTPError: HTTP Error 404: Not Found

What I would do in this situation is to add one day to the date like turning 2020-05-01 to 2020-05-02 and in case of throwing the aforementioned error I would add 2 days to the date or at most 3 days until there is a url available without an error.
I would like to know how I can write it in a program maybe with nested try – except where if adding 1 day to the date leads to a URL without error the subsequent steps are not executed.

As I don’t have the data I will use the following URL as an example:

import pandas as pd
import requests

url = 'http://winterolympicsmedals.com/medals.csv'
s = requests.get(url).content
c = pd.read_csv(s)

Here the file being read is medals.csv. If you try madels.csv or modals.csv you will get the error I am talking about. So I need to know how I can control the errors in 3 steps by replacing the file name until I get the desired dataframe like first we try madels.csv resulting in an error, then models.csv also resulting in an error and after that medals.csv which result in the desired output.

My problem is that sometimes the modification I made to the file also fails in except so I need to know how I can accommodate a second modification.

Asked By: Anoushiravan R

||

Source

Answer 1

here’s a simple function that given that URL will do exactly what you ask. Be aware that a slight change in the input url can lead to a few errors so make sure the date format is exactly what you mentioned. In any case:

Notes: URL Parsing


import pandas as pd
import datetime as dt

def url_next_day(url):
    # if the full url is passed I suggest you would use urllib parse but 
    # urllib parse but here's a workaround 
    filename = url.rstrip("/").split("/")[-1]
    date=dt.datetime.strptime(filename.strip(".csv"),"%Y%m%d").date()
    date_plus_one_day= date + dt.timedelta(days=1)
    new_file_name= dt.datetime.strftime(date_plus_one_day,"%Y%m%d")+".csv"
    url_next_day=url.replace(filename,new_file_name)

    return url_next_day

for url in list_of_urls:
    try:
        s = requests.get(url).content
        c = pd.read_csv(s)  
    except Exception as e:
        print(f"Invalid URL: {url}. The error: {e}. Trying the days after...")
        for _ in range(3): #because you want at most 3 days after
            try:
                url=url_next_day(url)
                s = requests.get(url).content
                c = pd.read_csv(s)
                break
            except Exception:
                pass
        else:
             print("No file available in the days after. Moving on")

Happy Coding!

Answered By: Daniel Gonçalves

Answer 2

OK, I have enough changes I want to recommend on top of @Daniel Gonçalves’s initial solution that I’m going to post them as a second answer.

1- The loop trying additional days needs to break when it got a hit, so it doesn’t keep going.

2- That loop needs an else: block to handle the complete failure case.

3- It is best practice to catch only the exception you mean to catch and know how to handle. Here a urllib.error.HTTPError means a failure to fetch the page, but an other exception would mean something else is wrong with the program, and it would be best not to catch that, so you would notice it and fix your program when that happens.

The result:

for url in list_of_urls:
    try:
        s = requests.get(url).content
        c = pd.read_csv(s)  
    except urllib.error.HTTPError as e:
        print(f"Invalid URL: {url}. The error: {e}. Trying the days after...")
        for _ in range(3): #because you want at most 3 days after
            try:
                url = url_next_day(url)
                s = requests.get(url).content
                c = pd.read_csv(s)
                break
            except urllib.error.HTTPError:
                print(f"Also failed to fetch {url}...")
        else:
            # this block is only executed if the loop never breaks
            print("No file available in the days after. Moving on.")
            c = None  # or an empty data frame, or whatever won't crash the rest of your code

Answered By: joanis

Answer 3

No need to do any nested try-except blocks, all you need is one try-except and a for loop.

First, function that tries to read a file (returns content of the file, or None if the file is not found):

def read_file(fp):

    try:
        with open(fp, 'r') as f:
            text = f.read()
            return text
    
    except Exception as e:
        print(e)
        return None

Then, function that tries to find a file from a predefined date (example input would be '20220514'). The functions tries to read content of the file with the given date, or dates up to 3 days after it:

from datetime import datetime, timedelta

def read_from_predefined_date(date):

    date_format = '%Y%m%d'
    date = datetime.strptime(date, date_format)

    result = None
    for i in range(4):

        date_to_read = date + timedelta(days=i)
        date_as_string = date_to_read.strftime(date_format)
        fp = f'data/{date_as_string}.csv'
        result = read_file(fp)

        if result:
            break

    return result

To test, e.g. create a data/20220515.csv and run following code:

d = '20220514'
result = read_from_predefined_date(d)

Answered By: druskacik

How to define multiple error handling statements?

Question:

Answers: