pd.read_excel() ignores filters

Question

I’m working with some xlsx files and need to import them into program. I’ve written a script that handles everything I need it to do.

However, I need to apply filters to the table in excel before importing them in.

When I apply filters and prep the table to import into python, python reads the entire table ignoring all the filters.

My work around has been filtering what I need then copying that to a new sheet. Then when reading into excel I specify the new sheet of filtered data that I’m looking for.

Is there a way to read the filtered table into excel directly?

Or Should I just import the entire table and apply those same filters using pandas in my script instead?

Asked By: user21126867

||

Source

Answer 1

IIUC, you can’t read only visible rows and/or columns of an Excel spreadsheet with pandas.

To do that, you need some help from openpyxl (!pip install openpyxl) :

from openpyxl import openpyxl
import pandas as pd

wb = load_workbook("file.xlsx")
ws = wb.active # or wb["SheetName"] # <- change the name here

rows = [[c.value for c in r
        if c.value and not ws.row_dimensions[r[0].row].hidden]
        for r in ws.iter_rows()]

df = pd.DataFrame(data= rows[1:], columns=rows[0]).dropna()

Output :

print(df)

   col  col2
0  foo   1.0
2  baz   3.0

Input used (spreadsheet) :

Answered By: Timeless

pd.read_excel() ignores filters

Question:

Answers: