How to read and modify csv files in function in loop and save as separated DataFrame in Python Pandas?

Question:

I try to create function in Python Pandas where:

  1. I read 5 csv
  2. make some aggregations on each readed csv (just to make it easier, we can delete one column)
  3. save each modified csv as DataFrames

Currently I have something like below, nevertheless it return only one DataFrame as output not 5, how can I change below code ?

def xx():
    #1. read 5 csv 
    for el in [col for col in os.listdir("mypath") if col.endswith(".csv")]:
    df = pd.read_csv("path/f"{el}"")
    
    #2. making aggregations
    df = df.drop("COL1", axis=1)

    #3. saving each modified csv to separated DataFrames
     ?????

FInally I need to have 5 separated DataFrames after modifications, how can I modify my function to achieve taht in Phython Pandas ?

Asked By: dingaro

||

Answers:

You can create an empty dictionnary and feed it gradually with the five processed dataframes.

Try this:

def xx():
    dico_dfs={}

    for el in [file for file in os.listdir("mypath") if file.endswith(".csv")]:
        #1. read 5 csv 
        df = pd.read_csv(f"path/{el}")

        #2. making aggregations
        df = df.drop("COL1", axis=1)

        #3. saving each modified csv to separated DataFrames
        dico_dfs[el]= df

You can access to each dataframe by using the filename as a key, e.g dico_dfs["file1.csv"].

If needed, you can make a single dataframe by using pandas.concat : pd.concat(dico_dfs).

Answered By: abokey