How to read only visible sheets from Excel using Pandas?

Question:

I have to get some random Excel sheets where I want to read only visible sheets from those files.

Consider one file at a time, let’s say I have Mapping_Doc.xls which contains 2-visible sheets and 2-hidden sheets.

As the sheets are less here, I can parse them with names like this:

Code :

xls = pd.ExcelFile('D:\ExcelRead\Mapping_Doc.xls')
print xls.sheet_names
df1 = xls.parse('Sheet1') #visible sheet
df2 = xls.parse('Sheet2') #visible sheet

Output:

[u'sheet1',u'sheet2',u'sheet3',u'sheet4']

How can I get only the visible sheets?

Asked By: Shivkumar kondi

||

Answers:

Pandas uses the xlrd library internally (have a look at the excel.py source code if you’re interested).

You can determine the visibility status by accessing each sheet’s visibility attribute. According to the comments in the xlrd source code, these are the possible values:

  • 0 = visible
  • 1 = hidden (can be unhidden by user — Format -> Sheet -> Unhide)
  • 2 = “very hidden” (can be unhidden only by VBA macro).

Here’s an example that reads an Excel file with 2 worksheets, the first one visible and the second one hidden:

import pandas as pd

xls = pd.ExcelFile('test.xlsx')

sheets = xls.book.sheets()

for sheet in sheets:
    print(sheet.name, sheet.visibility)

Output:

Sheet1 0
Sheet2 1
Answered By: DocZerø

Answer by @ƘɌỈSƬƠƑ, helped me too. I want to add few points.

For macro files (.xlsm), the visibility is unpredictable. May be because VBA code is behind the visibility setting. Even if a sheet is visible when the file is opened in Excel application, the visibility is not always 0 when read by xlrd.

Check the screenshots below:

This is what I see in Excel.

enter image description here

Visibility values fetched using xlrd

enter image description here

Answered By: ron

I did a lot of R&D on Pandas, but couldn’t find any solution. An alternate way is to use the xlrd library.

You need to install version 1.2.0 to get the support for xlsx excel format.

pip install xlrd==1.2.0

Read the excel and loop through the sheets. Inside the loop, you will get the sheet name and using the sheet name use the sheet_by_name method to get the sheet and on top of it use the visibility to find out if the sheet is visible or not.

import xlrd as xl
workbook = xl.open_workbook('File.xlsx')        
for sheet in workbook.sheets():
    isVisible = True if workbook.sheet_by_name(sheet.name).visibility == 0 else False
    if(isVisible == True):
        print(str(isVisible) + " : " + sheet.name)
    else:
        print(str(isVisible) + " : " + sheet.name)
Answered By: Sarath Subramanian

Update for pandas 1.2.3

import pandas as pd

xls = pd.ExcelFile(filename)

sheets = xls.book.worksheets

for sheet in sheets:
    print(sheet.title, sheet.sheet_state)

Answered By: Andrey Mazur

With Pandas 1.3.5, the response is different for the different engines:

  • Excel ≤2003 files are parsed with the xlrd engine.
  • Excel > 2003 files are parsed with the openpyxl engine.

This code works with both:

import pandas as pd
file = r'D:pathtoFile_to_Parse.xls'
excel = pd.ExcelFile(file)
if excel.engine == 'openpyxl':
    sheets = excel.book.worksheets
    for sheet in sheets:
        print(sheet.title, sheet.sheet_state) # .sheet_state returns a string
elif excel.engine == 'xlrd':
    sheets = excel.book.sheets()
    for sheet in sheets:
        print(sheet.name, sheet.visibility) # .visibility returns an integer: visible == 0, hidden > 0
else:
    print (f'{file} encoded with {excel.engine} engine. Unknown parsing.')
Answered By: cpilko
import pandas as pd
import re

xls = pd.ExcelFile('Test.xlsx')
sheets =xls.book.sheets()

for sheet in sheets:
    sheetnames = (sheet.name) + '-'+ str(sheet.visibility)
    if re.search(r'.*-[0]',sheetnames):
        targetsheet = sheetnames
        targetsheet = re.search(r'(.*)(-[0])',targetsheet)
        targetsheet = targetsheet.group(1)
        print(targetsheet)
Answered By: Nikita
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.