how to parse excel file where columns are not properly mentioned in python

Question:

I am looking to find the values under a specific cell attached below. I have many excel files like this and find the filename_pattern cell value (in the FEED row). Here in this ex: its BBB_ALLOC.gz*

Can someone guide me on how to find it using python or any other libraries?

I have tried the following code using openpyxl, but I am unsure how to get the values once in iterate until FEED.

from openpyxl import load_workbook

file = "mysheet.xlsx"

wb_obj = load_workbook(filename=file)
wsheet = wb_obj['sheet1']
print('hello')
dataDict = {}

for key, *values in wsheet.iter_rows():
    dataDict[key.value] = [v.value for v in values]
print(dataDict)
for k, v in dataDict.items():
    if k == 'FEED':
        print(v)

enter image description here

Asked By: BigD

||

Answers:

You are on right path with your code. You’ve got values of first column as keys in the dictionary dataDict. Once you find row with key == "FEED", iterate that row to find cell with value "filename_pattern". The cell, in same column but next row, is your desired filename value.

But you’ve collected the rows in a dictionary, which means you loose order of the rows and can’t get next row. You should iterate it in a different manner.

from openpyxl import load_workbook

file = "admin_recomm_comparison.xlsx"

wb_obj = load_workbook(filename=file)
wsheet = wb_obj['sheet1']

filename = None
feed_row = None
for row in wsheet.iter_rows():
    if row[0].value == "FEED":
        feed_row = row
        break

filename_pattern_cell = None
if feed_row:
    for cell in feed_row:
        if cell.value == "filename_pattern":
            filename_pattern_cell = cell
            break

if filename_pattern_cell:
    cell_name = str(filename_pattern_cell.column_letter) + str(int(filename_pattern_cell.row) + 1)
    print(wsheet[cell_name].value)
Answered By: Azhar Khan

I think xlwings could help here. I assume you have Excel on a computer. So we could use Excel API to find the cell of interest like this:

import xlwings as xw

sh = xw.Book('admin_recomm_comparison.xlsx').sheets['Sheet1']

feed_cell = sh.used_range.api.Find(
    What='FEED',
    LookAt=xw.constants.LookAt.xlWhole,
    LookIn=xw.constants.FindLookIn.xlFormulas,
    MatchCase=False
)

if feed_cell:
    print(f'{feed_cell.Address=}')
else:
    print("FEED wasn't found")
    
sh.book.close()

p.s. For the detailes of Range.Find see here.

Answered By: Vitalizzare
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.