Read merged cells in Excel with Python

Question:

I am trying to read merged cells of Excel with Python using xlrd.

My Excel: (note that the first column is merged across the three rows)

    A   B   C
  +---+---+----+
1 | 2 | 0 | 30 |
  +   +---+----+
2 |   | 1 | 20 |
  +   +---+----+
3 |   | 5 | 52 |
  +---+---+----+

I would like to read the third line of the first column as equal to 2 in this example, but it returns ''. Do you have any idea how to get to the value of the merged cell?

My code:

all_data = [[]]
excel = xlrd.open_workbook(excel_dir+ excel_file)
sheet_0 = excel.sheet_by_index(0) # Open the first tab

for row_index in range(sheet_0.nrows):
    row= ""
    for col_index in range(sheet_0.ncols):
        value = sheet_0.cell(rowx=row_index,colx=col_index).value             
        row += "{0} ".format(value)
        split_row = row.split()   
    all_data.append(split_row)

What I get:

'2', '0', '30'
'1', '20'
'5', '52'

What I would like to get:

'2', '0', '30'
'2', '1', '20'
'2', '5', '52'
Asked By: Antoine

||

Answers:

I just tried this and it seems to work for your sample data:

all_data = []
excel = xlrd.open_workbook(excel_dir+ excel_file)
sheet_0 = excel.sheet_by_index(0) # Open the first tab

prev_row = [None for i in range(sheet_0.ncols)]
for row_index in range(sheet_0.nrows):
    row= []
    for col_index in range(sheet_0.ncols):
        value = sheet_0.cell(rowx=row_index,colx=col_index).value
        if len(value) == 0:
            value = prev_row[col_index]
        row.append(value)
    prev_row = row
    all_data.append(row)

returning

[['2', '0', '30'], ['2', '1', '20'], ['2', '5', '52']]

It keeps track of the values from the previous row and uses them if the corresponding value from the current row is empty.

Note that the above code does not check if a given cell is actually part of a merged set of cells, so it could possibly duplicate previous values in cases where the cell should really be empty. Still, it might be of some help.

Additional information:

I subsequently found a documentation page that talks about a merged_cells attribute that one can use to determine the cells that are included in various ranges of merged cells. The documentation says that it is “New in version 0.6.1”, but when i tried to use it with xlrd-0.9.3 as installed by pip I got the error

NotImplementedError: formatting_info=True not yet implemented

I’m not particularly inclined to start chasing down different versions of xlrd to test the merged_cells feature, but perhaps you might be interested in doing so if the above code is insufficient for your needs and you encounter the same error that I did with formatting_info=True.

Answered By: Gord Thompson
openpyxl.worksheet.merged_cell_ranges

This function you can get a array like ['A1:M1', 'B22:B27'], which tell you the cells to be merged.

openpyxl.worksheet.merged_cells

This function shows you whether a cell has been merged or not

Answered By: user1731186

You can also try using fillna method available in pandas
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

df = pd.read_excel(dir+filename,header=1)
df[ColName] = df[ColName].fillna(method='ffill')

This should replace the cell’s value with the previous value

Answered By: pprasad009

For those who are looking for handling merged cell, the way OP has asked, while not overwriting non merged empty cells.

Based on OP’s code and additional information given by @gordthompson’s answers and @stavinsky’s comment, The following code will work for excel files (xls, xlsx), it will read excel file’s first sheet as a dataframe. For each merged cell, it will replicate that merged cell content over all the cells this merged cell represent, as asked by original poster.Note that merged_cell feature of xlrd for ‘xls’ file will only work if ‘formatting_info’ parameter is passed while opening workbook.

import pandas as pd
filepath = excel_dir+ excel_file
if excel_file.endswith('xlsx'):
    excel = pd.ExcelFile(xlrd.open_workbook(filepath), engine='xlrd')
elif excel_file.endswith('xls'):
    excel = pd.ExcelFile(xlrd.open_workbook(filepath, formatting_info=True), engine='xlrd')
else:
    print("don't yet know how to handle other excel file formats")
sheet_0 = excel.sheet_by_index(0) # Open the first tab
df = xls.parse(0, header=None) #read the first tab as a datframe

for e in sheet_0.merged_cells:
    rl,rh,cl,ch = e
    print e
    base_value = sheet1.cell_value(rl, cl)
    print base_value
    df.iloc[rl:rh,cl:ch] = base_value
Answered By: Alok Nayak

I was trying the previous solutions without having existo, nevertheless the following worked for me:

sheet = book.sheet_by_index(0)
all_data = []

for row_index in range(sheet.nrows):
    row = []
    for col_index in range(sheet.ncols):
        valor = sheet.cell(row_index,col_index).value
        if valor == '':
            for crange in sheet.merged_cells:
                rlo, rhi, clo, chi = crange
                if rlo <= row_index and row_index < rhi and clo <= col_index and col_index < chi:
                    valor = sheet.cell(rlo, clo).value
                    break
        row.append(valor)
    all_data.append(row)

print(all_data)

I hope it serves someone in the future

Answered By: Manuel Temple

Using XLRDs merged cells

ExcelFile = pd.read_excel("Excel_File.xlsx")
xl = xlrd.open_workbook("Excel_File.xlsx")
FirstSheet = xl.sheet_by_index(0)
for crange in FirstSheet.merged_cells:
    rlo, rhi,clo, chi = crange
    for rowx in range(rlo,rhi):
        for colx in range(clo,chi):
            value = FirstSheet.cell(rowx,colx).value
        if len(value) == 0:
            ExcelFile.iloc[rowx-1,colx] = FirstSheet.cell(rlo,clo).value
Answered By: Praful

Using xlwings

I have written a function that gives you the outcome of the merge cell. You will need to provide the cell that you are dealing with and the sheet name.

So first you need to give the name of the sheet as follow

# Import libraries like VBA
import xlwings as xw
# Read content of workbook
wb = xw.Book(Your_file with xlsx)
# Read first Sheet of workbook
Sheet1=wb.sheets[0]

Then past the following function

def VMC(Sheet1, x):
    # Get index of a cell 
    row=Sheet1[x].row
    column=Sheet1[x].column
    # Get first non-empty row
    if Sheet1[x].merge_cells:
        i=1
        while Sheet1[row-i,column-1:column].merge_cells:
            i+=1
        t=Sheet1[row-i+1,column-1:column].value
    else:
        t=Sheet1[row-i+1,column-1:column].value
    return t

Then you can have an example like so

a=VMC(Sheet1, 'D'+str(row_index_first))
a

Where the outcome is the merged cell itself. You can use this function when looping over many cells and creating a list of merge cells, then combining them with the rows required.

Or you do not have to use my function, but you can get inspired by it and write what is convenient for your case.

Answered By: rsc05
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.