So I have excel files with several sheets in each and I’m working on script which will gather data from selected sheets if they exist in file and combine it in one, big sheet. Generally it’s working, iterating through files and if desired sheet exist it finds range of cells with data and append it to dataframe. The thing I need to do now is to add header row (column names) to Dataframe, but in sheet those are multiline headers.
To make it look the same in dataframe i need to unmerge cells in top header row and copy value from first cell to rest of them in range which was merged before).
I’m using OpenPyXL for accessing excel sheets. My function receives sheet to work on as only parameter. It looks like this:
def checkForMergedCells(sheet): merged = ws.merged_cell_ranges for mergedCell in merged: mc_start, mc_stop = str(mergedCell).split(':') cp_value = sheet[mc_start] sheet.unmerge_cells(mergedCell) cell_range = sheet[mergedCell] for cell in cell_range: cell.value = cp_value
The thing is that cell_range returns a tuple which ends up in getting error message:
AttributeError: ‘tuple’ object has no attribute ‘value’
Below you can see screencap during debug which shows values passed in each variable.
Accessing by index will generally return a tuple of tuples except if you try to get an individual cell or row. For programmatic access you should use
You might like to spend some time looking at the
from openpyxl.utils import range_boundaries for group in ws.merged_cell_ranges: min_col, min_row, max_col, max_row = range_boundaries(group) top_left_cell_value = ws.cell(row=min_row, column=min_col).value for row in ws.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row): for cell in row: cell.value = top_left_cell_value
I was getting errors and a deprecation warning until I did this:
from openpyxl.utils import range_boundaries for group in sheet.merged_cells.ranges: # merged_cell_ranges deprecated display(range_boundaries(group._get_range_string())) # expects a string instead of an object min_col, min_row, max_col, max_row = range_boundaries(group._get_range_string()) top_left_cell_value = sheet.cell(row=min_row, column=min_col).value for row in sheet.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row): for cell in row: cell.value = top_left_cell_value
None of the previous answers works.
So I elaborated this one, tested it and it worked for me.
from openpyxl.utils import range_boundaries wb = load_workbook('Example.xlsx') sheets = wb.sheetnames ##['Sheet1', 'Sheet2'] for i,sheet in enumerate(sheets): ws = wb[sheets[i]] # you need a separate list to iterate on (see explanation #2 below) mergedcells = for group in ws.merged_cells.ranges: mergedcells.append(group) for group in mergedcells: min_col, min_row, max_col, max_row = group.bounds top_left_cell_value = ws.cell(row=min_row, column=min_col).value ws.unmerge_cells(str(group)) # you need to unmerge before writing (see explanation #1 below) for irow in range(min_row, max_row+1): for jcol in range(min_col, max_col+1): ws.cell(row = irow, column = jcol, value = top_left_cell_value)
@Дмитро Олександрович is almost right, but I had to change a few things to fix his answer:
You’ll have a
AttributeError: 'MergedCell' object attribute 'value' is read-only error, because you need to unmerge merged cells before change their value. (see here: https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1228)
you can not iterate directly over ws.merged_cells.ranges because iterating through a ‘ranges’ list object in python and changing it (with an
unmerge_cells function or a
pop function for example) will result in changing only half of the objects (see here: https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1085). You need to create a different list and iterate on it.
The below code from http://thequickblog.com/merge-unmerge-cells-openpyxl-in-python/ is worked for me.
import openpyxl from openpyxl.utils import range_boundaries wbook=openpyxl.load_workbook("openpyxl_merge_unmerge.xlsx") sheet=wbook["unmerge_sample"] for cell_group in sheet.merged_cells.ranges: min_col, min_row, max_col, max_row = range_boundaries(str(cell_group)) top_left_cell_value = sheet.cell(row=min_row, column=min_col).value sheet.unmerge_cells(str(cell_group)) for row in sheet.iter_rows(min_col=min_col, min_row=min_row, max_col=max_col, max_row=max_row): for cell in row: cell.value = top_left_cell_value wbook.save("openpyxl_merge_unmerge.xlsx") exit()
Regarding @Charlie Clark’s selected answer and the other answers which use the code from http://thequickblog.com/merge-unmerge-cells-openpyxl-in-python, you can unmerge the cells more easily without dealing with
range_boundaries and those conversions.
I was also getting issues with the selected answer where some merged cells would unmerge but others would not, and some unmerged cells would fill with the data I wanted but others would not.
The issue is that
worksheet.merged_cells.ranges is an iterator which means it is lazily evaluated, so when
worksheet.unmerge_cells() is called, the object
worksheet.merged_cells is mutated and side-effects occur when iterating the merged cell ranges again.
In my case, I wanted to unmerge cells like this, copying the border, font, and alignment information as well:
+-------+------+ +-------+------+ | Date | Time | | Date | Time | +=======+======+ +=======+======+ | Aug 6 | 1:00 | | | 1:00 | -> +-------+------+ | Aug 6 | 3:00 | | Aug 6 | 3:00 | | | 6:00 | +-------+------+ +-------+------+ | Aug 6 | 6:00 | +-------+------+
For the current latest version of
openpyxl==3.0.9, I found that the following works best for me:
from copy import copy from openpyxl import load_workbook, Workbook from openpyxl.cell import Cell from openpyxl.worksheet.cell_range import CellRange from openpyxl.worksheet.worksheet import Worksheet def unmerge_and_fill_cells(worksheet: Worksheet) -> None: """ Unmerges all merged cells in the given ``worksheet`` and copies the content and styling of the original cell to the newly unmerged cells. :param worksheet: The Excel worksheet containing the merged cells. """ # Must convert iterator to list to eagerly evaluate all merged cell ranges # before looping over them - this prevents unintended side-effects of # certain cell ranges from being skipped since `worksheet.unmerge_cells()` # is destructive. all_merged_cell_ranges: list[CellRange] = list( worksheet.merged_cells.ranges ) for merged_cell_range in all_merged_cell_ranges: merged_cell: Cell = merged_cell_range.start_cell worksheet.unmerge_cells(range_string=merged_cell_range.coord) # Don't need to convert iterator to list here since `merged_cell_range` # is cached for row_index, col_index in merged_cell_range.cells: cell: Cell = worksheet.cell(row=row_index, column=col_index) cell.value = merged_cell.value # (Optional) If you want to also copy the original cell styling to # the newly unmerged cells, you must use shallow `copy()` since # cell style properties are proxy objects which are not hashable. # # See <https://openpyxl.rtfd.io/en/stable/styles.html#copying-styles> cell.alignment = copy(merged_cell.alignment) cell.border = copy(merged_cell.border) cell.font = copy(merged_cell.font) # Sample usage if __name__ == "__main__": workbook: Workbook = load_workbook( filename="workbook_with_merged_cells.xlsx" ) worksheet: Worksheet = workbook["My Sheet"] unmerge_and_fill_cells(worksheet=worksheet) workbook.save(filename="workbook_with_unmerged_cells.xlsx")
Here is a shorter version without comments and not copying styles:
from openpyxl.worksheet.worksheet import Worksheet def unmerge_and_fill_cells(worksheet: Worksheet) -> None: for merged_cell_range in list(worksheet.merged_cells.ranges): worksheet.unmerge_cells(range_string=merged_cell_range.start_cell) for row_col_indices in merged_cell_range.cells: worksheet.cell(*row_col_indices).value = merged_cell.value
All the previous solutions were giving me some kind of error, probably due to different versions of openpyxl, with the current version (3.0.10) I found this solution to work for me:
for m_range in list(ws.merged_cells.ranges): merged_cell = m_range.start_cell ws.unmerge_cells(range_string=str(m_range)) for row_col_indices in m_range.cells: ws.cell(*row_col_indices).value = merged_cell.value