Create dataframes with names from a list
Question:
I have excel files with many tabs. I want to concat all of them, one tab at a time.
I am doing:
mypath = "mypath"
files = os.listdir(mypath)
files = [os.path.join(mypath,f) for f in files if f[-4:]=='xlsx']
sheets = pandas.ExcelFile(files[0]).sheet_names
Now, say my tabs are alpha, beta, gamma, etc., I want to create a liust of dataframes df_alpha, df_beta, etc. that are the union of all the alpha tabs of the files in. my directory.
By doing:
for sheet in sheets:
df = pandas.DataFrame()
for f in files:
df = pandas.concat([df, pandas.read_excel(f, sheet_name=sheet)])
I can get what I want, but of course I only have a dataframe which is the union of the last tab in each file. How can I change the code so that I have a list of dfs, each named df_alpha, df_beta, etc.?
Answers:
If you can make do with a dictionary of dataframes, the following might help:
df_dict = {}
for sheet in sheets:
df = pandas.DataFrame()
for f in files:
df = pandas.concat([df, pandas.read_excel(f, sheet_name=sheet)])
df_dict[sheet] = df
Later you can call the relevant df
from the dictionary using its key, e.g. df_dict['alpha']
.
Update: as noted in the comments by @ ALollz, the snippet above is inefficient because of multiple concats to the same dataframe. So a more efficient approach is:
df_dict = {}
for sheet in sheets:
df_dict[sheet] = pandas.concat(pandas.read_excel(f, sheet_name=sheet) for f in files)
Note that in this case it’s OK to not define an explicit list comprehension inside pandas.concat
(the inside expression becomes a generator).
Update 2: perhaps using the dict comprehension is more ‘pythonic’ (using the more common pd
instead of pandas
):
df_dict = {
sheet: pd.concat(pd.read_excel(f, sheet_name=sheet) for f in files)
for sheet in sheets
}
Here the trick is using above snippets to define key: value
with dictionary comprehension.
I have excel files with many tabs. I want to concat all of them, one tab at a time.
I am doing:
mypath = "mypath"
files = os.listdir(mypath)
files = [os.path.join(mypath,f) for f in files if f[-4:]=='xlsx']
sheets = pandas.ExcelFile(files[0]).sheet_names
Now, say my tabs are alpha, beta, gamma, etc., I want to create a liust of dataframes df_alpha, df_beta, etc. that are the union of all the alpha tabs of the files in. my directory.
By doing:
for sheet in sheets:
df = pandas.DataFrame()
for f in files:
df = pandas.concat([df, pandas.read_excel(f, sheet_name=sheet)])
I can get what I want, but of course I only have a dataframe which is the union of the last tab in each file. How can I change the code so that I have a list of dfs, each named df_alpha, df_beta, etc.?
If you can make do with a dictionary of dataframes, the following might help:
df_dict = {}
for sheet in sheets:
df = pandas.DataFrame()
for f in files:
df = pandas.concat([df, pandas.read_excel(f, sheet_name=sheet)])
df_dict[sheet] = df
Later you can call the relevant df
from the dictionary using its key, e.g. df_dict['alpha']
.
Update: as noted in the comments by @ ALollz, the snippet above is inefficient because of multiple concats to the same dataframe. So a more efficient approach is:
df_dict = {}
for sheet in sheets:
df_dict[sheet] = pandas.concat(pandas.read_excel(f, sheet_name=sheet) for f in files)
Note that in this case it’s OK to not define an explicit list comprehension inside pandas.concat
(the inside expression becomes a generator).
Update 2: perhaps using the dict comprehension is more ‘pythonic’ (using the more common pd
instead of pandas
):
df_dict = {
sheet: pd.concat(pd.read_excel(f, sheet_name=sheet) for f in files)
for sheet in sheets
}
Here the trick is using above snippets to define key: value
with dictionary comprehension.