How to read only visible sheets from Excel using Pandas?
Question:
I have to get some random Excel sheets where I want to read only visible sheets from those files.
Consider one file at a time, let’s say I have Mapping_Doc.xls
which contains 2-visible sheets and 2-hidden sheets.
As the sheets are less here, I can parse them with names like this:
Code :
xls = pd.ExcelFile('D:\ExcelRead\Mapping_Doc.xls')
print xls.sheet_names
df1 = xls.parse('Sheet1') #visible sheet
df2 = xls.parse('Sheet2') #visible sheet
Output:
[u'sheet1',u'sheet2',u'sheet3',u'sheet4']
How can I get only the visible sheets?
Answers:
Pandas uses the xlrd
library internally (have a look at the excel.py source code if you’re interested).
You can determine the visibility status by accessing each sheet’s visibility
attribute. According to the comments in the xlrd source code, these are the possible values:
- 0 = visible
- 1 = hidden (can be unhidden by user — Format -> Sheet -> Unhide)
- 2 = “very hidden” (can be unhidden only by VBA macro).
Here’s an example that reads an Excel file with 2 worksheets, the first one visible and the second one hidden:
import pandas as pd
xls = pd.ExcelFile('test.xlsx')
sheets = xls.book.sheets()
for sheet in sheets:
print(sheet.name, sheet.visibility)
Output:
Sheet1 0
Sheet2 1
Answer by @ƘɌỈSƬƠƑ, helped me too. I want to add few points.
For macro files (.xlsm), the visibility is unpredictable. May be because VBA code is behind the visibility setting. Even if a sheet is visible when the file is opened in Excel application, the visibility is not always 0 when read by xlrd.
Check the screenshots below:
This is what I see in Excel.
Visibility values fetched using xlrd
I did a lot of R&D on Pandas, but couldn’t find any solution. An alternate way is to use the xlrd
library.
You need to install version 1.2.0
to get the support for xlsx
excel format.
pip install xlrd==1.2.0
Read the excel and loop through the sheets. Inside the loop, you will get the sheet name and using the sheet name use the sheet_by_name
method to get the sheet and on top of it use the visibility
to find out if the sheet is visible or not.
import xlrd as xl
workbook = xl.open_workbook('File.xlsx')
for sheet in workbook.sheets():
isVisible = True if workbook.sheet_by_name(sheet.name).visibility == 0 else False
if(isVisible == True):
print(str(isVisible) + " : " + sheet.name)
else:
print(str(isVisible) + " : " + sheet.name)
Update for pandas 1.2.3
import pandas as pd
xls = pd.ExcelFile(filename)
sheets = xls.book.worksheets
for sheet in sheets:
print(sheet.title, sheet.sheet_state)
With Pandas 1.3.5, the response is different for the different engines:
- Excel ≤2003 files are parsed with the
xlrd
engine.
- Excel > 2003 files are parsed with the
openpyxl
engine.
This code works with both:
import pandas as pd
file = r'D:pathtoFile_to_Parse.xls'
excel = pd.ExcelFile(file)
if excel.engine == 'openpyxl':
sheets = excel.book.worksheets
for sheet in sheets:
print(sheet.title, sheet.sheet_state) # .sheet_state returns a string
elif excel.engine == 'xlrd':
sheets = excel.book.sheets()
for sheet in sheets:
print(sheet.name, sheet.visibility) # .visibility returns an integer: visible == 0, hidden > 0
else:
print (f'{file} encoded with {excel.engine} engine. Unknown parsing.')
import pandas as pd
import re
xls = pd.ExcelFile('Test.xlsx')
sheets =xls.book.sheets()
for sheet in sheets:
sheetnames = (sheet.name) + '-'+ str(sheet.visibility)
if re.search(r'.*-[0]',sheetnames):
targetsheet = sheetnames
targetsheet = re.search(r'(.*)(-[0])',targetsheet)
targetsheet = targetsheet.group(1)
print(targetsheet)
I have to get some random Excel sheets where I want to read only visible sheets from those files.
Consider one file at a time, let’s say I have Mapping_Doc.xls
which contains 2-visible sheets and 2-hidden sheets.
As the sheets are less here, I can parse them with names like this:
Code :
xls = pd.ExcelFile('D:\ExcelRead\Mapping_Doc.xls')
print xls.sheet_names
df1 = xls.parse('Sheet1') #visible sheet
df2 = xls.parse('Sheet2') #visible sheet
Output:
[u'sheet1',u'sheet2',u'sheet3',u'sheet4']
How can I get only the visible sheets?
Pandas uses the xlrd
library internally (have a look at the excel.py source code if you’re interested).
You can determine the visibility status by accessing each sheet’s visibility
attribute. According to the comments in the xlrd source code, these are the possible values:
- 0 = visible
- 1 = hidden (can be unhidden by user — Format -> Sheet -> Unhide)
- 2 = “very hidden” (can be unhidden only by VBA macro).
Here’s an example that reads an Excel file with 2 worksheets, the first one visible and the second one hidden:
import pandas as pd
xls = pd.ExcelFile('test.xlsx')
sheets = xls.book.sheets()
for sheet in sheets:
print(sheet.name, sheet.visibility)
Output:
Sheet1 0
Sheet2 1
Answer by @ƘɌỈSƬƠƑ, helped me too. I want to add few points.
For macro files (.xlsm), the visibility is unpredictable. May be because VBA code is behind the visibility setting. Even if a sheet is visible when the file is opened in Excel application, the visibility is not always 0 when read by xlrd.
Check the screenshots below:
This is what I see in Excel.
Visibility values fetched using xlrd
I did a lot of R&D on Pandas, but couldn’t find any solution. An alternate way is to use the xlrd
library.
You need to install version 1.2.0
to get the support for xlsx
excel format.
pip install xlrd==1.2.0
Read the excel and loop through the sheets. Inside the loop, you will get the sheet name and using the sheet name use the sheet_by_name
method to get the sheet and on top of it use the visibility
to find out if the sheet is visible or not.
import xlrd as xl
workbook = xl.open_workbook('File.xlsx')
for sheet in workbook.sheets():
isVisible = True if workbook.sheet_by_name(sheet.name).visibility == 0 else False
if(isVisible == True):
print(str(isVisible) + " : " + sheet.name)
else:
print(str(isVisible) + " : " + sheet.name)
Update for pandas 1.2.3
import pandas as pd
xls = pd.ExcelFile(filename)
sheets = xls.book.worksheets
for sheet in sheets:
print(sheet.title, sheet.sheet_state)
With Pandas 1.3.5, the response is different for the different engines:
- Excel ≤2003 files are parsed with the
xlrd
engine. - Excel > 2003 files are parsed with the
openpyxl
engine.
This code works with both:
import pandas as pd
file = r'D:pathtoFile_to_Parse.xls'
excel = pd.ExcelFile(file)
if excel.engine == 'openpyxl':
sheets = excel.book.worksheets
for sheet in sheets:
print(sheet.title, sheet.sheet_state) # .sheet_state returns a string
elif excel.engine == 'xlrd':
sheets = excel.book.sheets()
for sheet in sheets:
print(sheet.name, sheet.visibility) # .visibility returns an integer: visible == 0, hidden > 0
else:
print (f'{file} encoded with {excel.engine} engine. Unknown parsing.')
import pandas as pd
import re
xls = pd.ExcelFile('Test.xlsx')
sheets =xls.book.sheets()
for sheet in sheets:
sheetnames = (sheet.name) + '-'+ str(sheet.visibility)
if re.search(r'.*-[0]',sheetnames):
targetsheet = sheetnames
targetsheet = re.search(r'(.*)(-[0])',targetsheet)
targetsheet = targetsheet.group(1)
print(targetsheet)