How to cut, copy, paste, delete data from excel sheet without hampering cell formatting

Question:

Magicians out there….

I need your help with the best approaches for the below use case.

I have an excel sheet whith lakhs of rows of data and I need to filter it based on some criteria and need to create new multiple tiles.
I am in no mood to do it manually hence started working out on a python script that will do the job for me. however, I am facing a few challenges in achieving the end goal. The challenges are the "Color Formatting" and "comment" added to the cell.

Let’s recreate the scenario. I have attached a sample excel sheet for your reference here.
it includes "Indian Cars" data with 4 headers called (Brand, Model, Fuel Type & Transmission Type). I need to filter the data based on "Brand" and create a new excel file (workbook) with the Brand name as the excel file name.

enter image description here

Approach 1:-
First I started with loading an excelsheet into a data frame with Pandas and then filtered the data and exported it, that was quite fast and easy and I loved it. However, I am losing cell colors and added note to the cell (Model & Fuel type)

Note: I tried styling the pandas, however, for some reason, it’s not working for me.

Approach 2:-
I though of using Openpyxl & Xlsxwriter, however, the issue is I am unable to filter data and keep comments added to the header.

Approach 3:-
Logically, I can create a copy of my existing sheet and delete the unwanted rows from it and save it with desired name, that should do the job for me. Unfortunately, I am unable to figure out how to achieve it in python.

Kindly share your thoughts on this and help me with right approach… and If I can get a sample code or full code… that would just make my day… 😀

Asked By: Krishna

||

Answers:

This should do the trick. You can change the colors of the headers.

Code for custom styling of the excel added.

import pandas as pd

# function to style the dataframe with some conditons (simple condition for an example you can change or add conditions with multiple rows)
def style_df(row):
    values = pd.Series(data=False, index=row.index)
    if not pd.isna(row['Transmission Type']):
        if row['Transmission Type'].strip() == 'Manual':
            return ['background-color : gray; color: red' for _ in values]
        elif row['Transmission Type'].strip() == 'Manual, Automatic':
            return ['background-color : lightblue; color: green' for _ in values]
    
    return ['' for _ in values]

page = pd.read_excel("Cars_in_india.xlsx", 'Cars in India')

# creating an excel file for each brand
for brand in page.Brand.unique():
    writer = pd.ExcelWriter(brand+".xlsx", engine = 'xlsxwriter')
    workbook = writer.book
    border_fmt = workbook.add_format({'bottom':1, 'top':1, 'left':1, 'right':1})

    dataframe = page[page.Brand == brand].copy()

    dataframe = dataframe.style.apply(style_df, axis=1)
    dataframe.to_excel(writer, index=False, sheet_name=brand)
    
    # dynamic columns sizes
    for column in page[page.Brand == brand]:
        column_width = max(page[page.Brand == brand][column].astype(str).map(len).max(), len(column))
        col_idx = page[page.Brand == brand].columns.get_loc(column)
        writer.sheets[brand].set_column(col_idx, col_idx, column_width)
        
    worksheet = writer.sheets[brand]
    
    #applying style to the header columns
    worksheet.write(0, 1, "Model", workbook.add_format({'fg_color': '#00FF00'}))
    worksheet.write(0, 2, "Fuel Type", workbook.add_format({'fg_color': '#FFA500'}))
    
    # applying borders to the table
    worksheet.conditional_format(xlsxwriter.utility.xl_range(0, 0, len(page[page.Brand == brand]), len(page[page.Brand == brand].columns)-1), {'type': 'no_errors', 'format': border_fmt})

    writer.save()

You can use openpyxl to read the coments and then write the comments when creating the excel. But you used a type of comment not compatible with the current version of excel that openpyxl uses (you will see the same error in the google cloud editor). Then, the only option is to change the type of the comment or rewrite them in the python code.

Example code:

from openpyxl import load_workbook

wb = load_workbook("Cars_in_india.xlsx")
ws = wb["Cars in India"]

_, comment, comment2, _ = list(ws.rows)[0]

# then after this code:
# worksheet.write(0, 1, "Model", workbook.add_format({'fg_color': '#00FF00'}))
# worksheet.write(0, 2, "Fuel Type", workbook.add_format({'fg_color': '#FFA500'}))
# you can add:
worksheet.write_comment('B1', comment.text)
worksheet.write_comment('C1', comment2.text)
Answered By: Nacho R.