How can we write new data to existing Excel spreadsheet?

Question:

I have a process that creates a dataframe of almost 1,000 rows that runs each week. I would like to be able to append to an existing sheet without having to re-read the spreadsheet because that will take a long time as the file grows. I saw this answer here: Append existing excel sheet with new dataframe using python pandas. Unfortunately, it doesn’t seem to be working correctly for me. Here is some dummy code that I am trying to append to that existing file. It causes two issues at present – first, it does not append, but rather overwrites the data. Secondly, when I go to open the file, even after the program runs, it only allows me to open it in read-only mode. I have confirmed I am using pandas 1.4 as well.

import pandas as pd


data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pd.DataFrame(data)


filename = "Testing Append Process.xlsx"
writer = pd.ExcelWriter(filename, engine="openpyxl", mode="a", if_sheet_exists="overlay")

df.to_excel(writer, index=False)
writer.save()
Asked By: Shawn Schreier

||

Answers:

Here is a way to limit how much of the spreadsheet is read, but with openpyxl.

Getting setup: openpyxl documentation

max_rows = max((c.row for c in active_ws['A'] if c.value is not None)) + 1

This will return the max amount of rows within the document… add one to it, and you will have your starting point to start placing the data you need to add.

Here is an example of placing that data with openpyxl:

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}

count = 0 
for x in range(4):
    name = data['Name'][x]
    age = data['Age'][x]

    active_ws.cell(row=max_row+count, column=1).value = name
    active_ws.cell(row=max_row+count, column=2).value = name

    count += 1

Edit: The max rows var can be adjusted to any column you want checked. (This one checks column "A")… The column cannot have an empty cells until your data is done otherwise, it will give you an incorrect "max rows".

Answered By: BMarshyyy

Please notes that Testing Append Process.xlsx file has to be created before running this code.

from openpyxl import load_workbook
import pandas as pd

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pd.DataFrame(data)

filename = "Testing Append Process.xlsx"
workbook = load_workbook(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl')
writer.book = workbook
writer.sheets = {ws.title: ws for ws in workbook.worksheets}

df.to_excel(writer, startrow=writer.sheets['Sheet1'].max_row, index = False, header= False)

writer.close()

Returns the following if you will run the code twice.

enter image description here

Answered By: gremur

The alternative solution above now gives a FutureWarning message for writer.book attribute. So here is the new solution.

from openpyxl import load_workbook

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}

filename = "Testing Append Process.xlsx"
workbook = load_workbook(filename)
worksheet = workbook.active

for record in list(zip(*data.values())):
    worksheet.append(record)
    
workbook.save(filename)

Returns the following if you will run the code twice.

enter image description here

Answered By: gremur