reading multiple tabs from excel in different dataframes

Question:

I am trying to read multiple tabs in spreadsheet to different dataframes and once all tabs with data are over the program should stop.

For first part I am looking to do something like

xls = pd.ExcelFile('Unique.xlsx') 
for i in range(1,n): # n should be number of tabs with data
 try:
    df_Sector(i)=xls.parse('Sheet'+i) # df_Sector(i) has to be dataframe
 except:
    pass

I want the program to stop once all tabs with data are read

Asked By: abhi_phoenix

||

Answers:

This will read all sheets and make a dictionary of dataframes:

xl = pd.read_excel('Unique.xlsx', sheet_name=None)

To get specific sheets, you could do:

xl_dict = {}
sheetname_list = ['blah1', 'blah2', 'blah3']
for sheet in sheetname_list:
    xl_dict[sheet] = pd.read_excel('Unique.xlsx', sheet_name=sheet)

or:

xl = pd.read_excel('Unique.xlsx', sheet_name=sheetname_list)
Answered By: b2002

Demo:

file name

In [94]: fn = r'D:temp.datatest.xlsx'

creating pandas.io.excel.ExcelFile object

In [95]: xl = pd.ExcelFile(fn)

it has sheet_names attribute

In [96]: xl.sheet_names
Out[96]: ['Sheet1', 'aaa']

we can use it for looping through sheets

In [98]: for sh in xl.sheet_names:
    ...:     df = xl.parse(sh)
    ...:     print('Processing: [{}] ...'.format(sh))
    ...:     print(df.head())
    ...:
Processing: [Sheet1] ...
   col1  col2  col3
0    11    12    13
1    21    22    23
2    31    32    33
Processing: [aaa] ...
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

a bit more elegant way is to generate a dictionary of DataFrames:

In [100]: dfs = {sh:xl.parse(sh) for sh in xl.sheet_names}

In [101]: dfs.keys()
Out[101]: dict_keys(['Sheet1', 'aaa'])

In [102]: dfs['Sheet1']
Out[102]:
   col1  col2  col3
0    11    12    13
1    21    22    23
2    31    32    33

In [103]: dfs['aaa']
Out[103]:
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

USE sheet_name=None TO PARSE ALL SHEET!

Approach answered by this answer is faster compared to make ExcelFile first.

Benchmark:

%%timeit
dfs = pd.read_excel(path, sheet_name=None)
# 279 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
xl = pd.ExcelFile(path)
dfs = {sh:xl.parse(sh) for sh in xl.sheet_names}
# 335 ms ± 40.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Answered By: Muhammad Yasirroni
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.