Merge rows based on value (pandas to excel – xlsxwriter)

Question:

I’m trying to output a Pandas dataframe into an excel file using xlsxwriter. However I’m trying to apply some rule-based formatting; specifically trying to merge cells that have the same value, but having trouble coming up with how to write the loop. (New to Python here!)

See below for output vs output expected:

enter image description here

(As you can see based off the image above I’m trying to merge cells under the Name column when they have the same values).

Here is what I have thus far:

#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)

#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)


for row_index in range(1,len(car_report)):
     if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name'] 
     #find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
     #from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)

Any help is greatly appreciated!

Thank you!

Asked By: mikelowry

||

Answers:

Your logic is almost correct, however i approached your problem through a slightly different approach:

1) Sort the column, make sure that all the values are grouped together.

2) Reset the index (using reset_index() and maybe pass the arg drop=True).

3) Then we have to capture the rows where the value is new. For that purpose create a list and add the first row 1 because we will start for sure from there.

4) Then start iterating over the rows of that list and check some conditions:

4a) If we only have one row with a value the merge_range method will give an error because it can not merge one cell. In that case we need to replace the merge_range with the write method.

4b) With this algorithm you ‘ll get an index error when trying to write the last value of the list (because it is comparing it with the value in the next index postion, and because it is the last value of the list there is not a next index position). So we need to specifically mention that if we get an index error (which means we are checking the last value) we want to merge or write until the last row of the dataframe.

4c) Finally i did not take into consideration if the column contains blank or null cells. In that case code needs to be adjusted.

Lastly code might look a bit confusing, you have to take in mind that the 1st row for pandas is 0 indexed (headers are separate) while for xlsxwriter headers are 0 indexed and the first row is indexed 1.

Here is a working example to achieve exactly what you want to do:

import pandas as pd

# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
    if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
        startCells.append(row)


writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})


lastRow = len(df)

for row in startCells:
    try:
        endRow = startCells[startCells.index(row)+1]-1
        if row == endRow:
            worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
    except IndexError:
        if row == lastRow:
            worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)


writer.save()

Output:

enter image description here

Answered By: Dimitris Thomas

Alternate Approach:
One can use the unique() function to find the index assigned to each unique value (car name in this example). Using the above test data,

import pandas as pd

# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})

for car in df['Name'].unique():
    # find indices and add one to account for header
    u=df.loc[df['Name']==car].index.values + 1

    if len(u) <2: 
        pass # do not merge cells if there is only one car name
    else:
        # merge cells using the first and last indices
        worksheet.merge_range(u[0], 0, u[-1], 0, df.loc[u[0],'Name'], merge_format)
writer.save()
Answered By: mjm

Alternate Approach : Other than xlsxwriter you can also use a pivot table.

dataframe=pd.pivot_table(df,index=[column name...])
df.to_excel(dataframe)
Answered By: Lenate John

I think this is a better answer to your problem

df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

# Use the groupby() function to group the rows by 'Name'
grouped = df.groupby('Name')

# Use the first() function to find the first row of each group
first_rows = grouped.first()

# Create a new column 'start_row' that contains the index of the first row of each group
first_rows['start_row'] = first_rows.index.map(lambda x: (df['Name'] == x).idxmax())

# Create a new column 'end_row' that contains the index of the last row of each group
first_rows['end_row'] = grouped.last().index.map(lambda x: (df['Name'] == x).idxmax())

# Create an empty list to store the merge ranges
merge_ranges = []

# Iterate over the first_rows dataframe and add the merge ranges to the list
for index, row in first_rows.iterrows():
    merge_ranges.append((row['start_row'], 0, row['end_row'], 0))

# Write the dataframe to an excel file and apply the merge ranges
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
for merge_range in merge_ranges:
    worksheet.merge_range(*merge_range, "", worksheet.get_default_format())
writer.save()
Answered By: Sudeep George

Should "just work" with set_index() and to_excel()

my_index_cols = ['Name'] # this can also be a list of multiple columns
df.set_index(my_index_cols).to_excel('filename.xlsx', index=True, header=None)

see also: https://stackoverflow.com/a/68208815/2098573

Answered By: rysqui
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.