How to read only visible sheets from Excel using Pandas?

Question

I have to get some random Excel sheets where I want to read only visible sheets from those files.

Consider one file at a time, let’s say I have Mapping_Doc.xls which contains 2-visible sheets and 2-hidden sheets.

As the sheets are less here, I can parse them with names like this:

Code :

xls = pd.ExcelFile('D:\ExcelRead\Mapping_Doc.xls')
print xls.sheet_names
df1 = xls.parse('Sheet1') #visible sheet
df2 = xls.parse('Sheet2') #visible sheet

Output:

[u'sheet1',u'sheet2',u'sheet3',u'sheet4']

How can I get only the visible sheets?

Asked By: Shivkumar kondi

||

Source

Answer 1

Pandas uses the xlrd library internally (have a look at the excel.py source code if you’re interested).

You can determine the visibility status by accessing each sheet’s visibility attribute. According to the comments in the xlrd source code, these are the possible values:

0 = visible
1 = hidden (can be unhidden by user — Format -> Sheet -> Unhide)
2 = “very hidden” (can be unhidden only by VBA macro).

Here’s an example that reads an Excel file with 2 worksheets, the first one visible and the second one hidden:

import pandas as pd

xls = pd.ExcelFile('test.xlsx')

sheets = xls.book.sheets()

for sheet in sheets:
    print(sheet.name, sheet.visibility)

Output:

Sheet1 0
Sheet2 1

Answered By: DocZerø

Answer 2

Answer by @ƘɌỈSƬƠƑ, helped me too. I want to add few points.

For macro files (.xlsm), the visibility is unpredictable. May be because VBA code is behind the visibility setting. Even if a sheet is visible when the file is opened in Excel application, the visibility is not always 0 when read by xlrd.

Check the screenshots below:

This is what I see in Excel.

Visibility values fetched using xlrd

Answered By: ron

Answer 3

I did a lot of R&D on Pandas, but couldn’t find any solution. An alternate way is to use the xlrd library.

You need to install version 1.2.0 to get the support for xlsx excel format.

pip install xlrd==1.2.0

Read the excel and loop through the sheets. Inside the loop, you will get the sheet name and using the sheet name use the sheet_by_name method to get the sheet and on top of it use the visibility to find out if the sheet is visible or not.

import xlrd as xl
workbook = xl.open_workbook('File.xlsx')        
for sheet in workbook.sheets():
    isVisible = True if workbook.sheet_by_name(sheet.name).visibility == 0 else False
    if(isVisible == True):
        print(str(isVisible) + " : " + sheet.name)
    else:
        print(str(isVisible) + " : " + sheet.name)

Answered By: Sarath Subramanian

Answer 4

Update for pandas 1.2.3

import pandas as pd

xls = pd.ExcelFile(filename)

sheets = xls.book.worksheets

for sheet in sheets:
    print(sheet.title, sheet.sheet_state)

Answered By: Andrey Mazur

Answer 5

With Pandas 1.3.5, the response is different for the different engines:

Excel ≤2003 files are parsed with the xlrd engine.
Excel > 2003 files are parsed with the openpyxl engine.

This code works with both:

import pandas as pd
file = r'D:pathtoFile_to_Parse.xls'
excel = pd.ExcelFile(file)
if excel.engine == 'openpyxl':
    sheets = excel.book.worksheets
    for sheet in sheets:
        print(sheet.title, sheet.sheet_state) # .sheet_state returns a string
elif excel.engine == 'xlrd':
    sheets = excel.book.sheets()
    for sheet in sheets:
        print(sheet.name, sheet.visibility) # .visibility returns an integer: visible == 0, hidden > 0
else:
    print (f'{file} encoded with {excel.engine} engine. Unknown parsing.')

Answered By: cpilko

Answer 6

import pandas as pd
import re

xls = pd.ExcelFile('Test.xlsx')
sheets =xls.book.sheets()

for sheet in sheets:
    sheetnames = (sheet.name) + '-'+ str(sheet.visibility)
    if re.search(r'.*-[0]',sheetnames):
        targetsheet = sheetnames
        targetsheet = re.search(r'(.*)(-[0])',targetsheet)
        targetsheet = targetsheet.group(1)
        print(targetsheet)

Answered By: Nikita

How to read only visible sheets from Excel using Pandas?

Question:

Answers: