Create dataframes with names from a list

Question:

I have excel files with many tabs. I want to concat all of them, one tab at a time.

I am doing:

mypath = "mypath"
files = os.listdir(mypath)
files = [os.path.join(mypath,f) for f in files if f[-4:]=='xlsx']

sheets = pandas.ExcelFile(files[0]).sheet_names

Now, say my tabs are alpha, beta, gamma, etc., I want to create a liust of dataframes df_alpha, df_beta, etc. that are the union of all the alpha tabs of the files in. my directory.

By doing:

for sheet in sheets:
    df = pandas.DataFrame()
    for f in files:
        df = pandas.concat([df, pandas.read_excel(f, sheet_name=sheet)])

I can get what I want, but of course I only have a dataframe which is the union of the last tab in each file. How can I change the code so that I have a list of dfs, each named df_alpha, df_beta, etc.?

Asked By: user

||

Answers:

If you can make do with a dictionary of dataframes, the following might help:

df_dict = {}
for sheet in sheets:
    df = pandas.DataFrame()
    for f in files:
        df = pandas.concat([df, pandas.read_excel(f, sheet_name=sheet)])
    df_dict[sheet] = df

Later you can call the relevant df from the dictionary using its key, e.g. df_dict['alpha'].

Update: as noted in the comments by @ ALollz, the snippet above is inefficient because of multiple concats to the same dataframe. So a more efficient approach is:

df_dict = {}
for sheet in sheets:
    df_dict[sheet] = pandas.concat(pandas.read_excel(f, sheet_name=sheet) for f in files)

Note that in this case it’s OK to not define an explicit list comprehension inside pandas.concat (the inside expression becomes a generator).

Update 2: perhaps using the dict comprehension is more ‘pythonic’ (using the more common pd instead of pandas):

df_dict = {
    sheet: pd.concat(pd.read_excel(f, sheet_name=sheet) for f in files)
    for sheet in sheets
}

Here the trick is using above snippets to define key: value with dictionary comprehension.

Answered By: SultanOrazbayev