pd.read_excel() ignores filters

Question:

I’m working with some xlsx files and need to import them into program. I’ve written a script that handles everything I need it to do.

However, I need to apply filters to the table in excel before importing them in.

When I apply filters and prep the table to import into python, python reads the entire table ignoring all the filters.

My work around has been filtering what I need then copying that to a new sheet. Then when reading into excel I specify the new sheet of filtered data that I’m looking for.

Is there a way to read the filtered table into excel directly?

Or Should I just import the entire table and apply those same filters using pandas in my script instead?

Asked By: user21126867

||

Answers:

IIUC, you can’t read only visible rows and/or columns of an Excel spreadsheet with .

To do that, you need some help from (!pip install openpyxl) :

from openpyxl import openpyxl
import pandas as pd
​
wb = load_workbook("file.xlsx")
ws = wb.active # or wb["SheetName"] # <- change the name here
​
rows = [[c.value for c in r
        if c.value and not ws.row_dimensions[r[0].row].hidden]
        for r in ws.iter_rows()]
​
df = pd.DataFrame(data= rows[1:], columns=rows[0]).dropna()


Output :

print(df)

   col  col2
0  foo   1.0
2  baz   3.0

Input used (spreadsheet) :

enter image description here

Answered By: Timeless
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.