How can we write new data to existing Excel spreadsheet?
Question:
I have a process that creates a dataframe of almost 1,000 rows that runs each week. I would like to be able to append to an existing sheet without having to re-read the spreadsheet because that will take a long time as the file grows. I saw this answer here: Append existing excel sheet with new dataframe using python pandas. Unfortunately, it doesn’t seem to be working correctly for me. Here is some dummy code that I am trying to append to that existing file. It causes two issues at present – first, it does not append, but rather overwrites the data. Secondly, when I go to open the file, even after the program runs, it only allows me to open it in read-only mode. I have confirmed I am using pandas 1.4 as well.
import pandas as pd
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pd.DataFrame(data)
filename = "Testing Append Process.xlsx"
writer = pd.ExcelWriter(filename, engine="openpyxl", mode="a", if_sheet_exists="overlay")
df.to_excel(writer, index=False)
writer.save()
Answers:
Here is a way to limit how much of the spreadsheet is read, but with openpyxl.
Getting setup: openpyxl documentation
max_rows = max((c.row for c in active_ws['A'] if c.value is not None)) + 1
This will return the max amount of rows within the document… add one to it, and you will have your starting point to start placing the data you need to add.
Here is an example of placing that data with openpyxl:
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
count = 0
for x in range(4):
name = data['Name'][x]
age = data['Age'][x]
active_ws.cell(row=max_row+count, column=1).value = name
active_ws.cell(row=max_row+count, column=2).value = name
count += 1
Edit: The max rows var can be adjusted to any column you want checked. (This one checks column "A")… The column cannot have an empty cells until your data is done otherwise, it will give you an incorrect "max rows".
Please notes that Testing Append Process.xlsx
file has to be created before running this code.
from openpyxl import load_workbook
import pandas as pd
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pd.DataFrame(data)
filename = "Testing Append Process.xlsx"
workbook = load_workbook(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl')
writer.book = workbook
writer.sheets = {ws.title: ws for ws in workbook.worksheets}
df.to_excel(writer, startrow=writer.sheets['Sheet1'].max_row, index = False, header= False)
writer.close()
Returns the following if you will run the code twice.
The alternative solution above now gives a FutureWarning message for writer.book attribute. So here is the new solution.
from openpyxl import load_workbook
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
filename = "Testing Append Process.xlsx"
workbook = load_workbook(filename)
worksheet = workbook.active
for record in list(zip(*data.values())):
worksheet.append(record)
workbook.save(filename)
Returns the following if you will run the code twice.
I have a process that creates a dataframe of almost 1,000 rows that runs each week. I would like to be able to append to an existing sheet without having to re-read the spreadsheet because that will take a long time as the file grows. I saw this answer here: Append existing excel sheet with new dataframe using python pandas. Unfortunately, it doesn’t seem to be working correctly for me. Here is some dummy code that I am trying to append to that existing file. It causes two issues at present – first, it does not append, but rather overwrites the data. Secondly, when I go to open the file, even after the program runs, it only allows me to open it in read-only mode. I have confirmed I am using pandas 1.4 as well.
import pandas as pd
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pd.DataFrame(data)
filename = "Testing Append Process.xlsx"
writer = pd.ExcelWriter(filename, engine="openpyxl", mode="a", if_sheet_exists="overlay")
df.to_excel(writer, index=False)
writer.save()
Here is a way to limit how much of the spreadsheet is read, but with openpyxl.
Getting setup: openpyxl documentation
max_rows = max((c.row for c in active_ws['A'] if c.value is not None)) + 1
This will return the max amount of rows within the document… add one to it, and you will have your starting point to start placing the data you need to add.
Here is an example of placing that data with openpyxl:
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
count = 0
for x in range(4):
name = data['Name'][x]
age = data['Age'][x]
active_ws.cell(row=max_row+count, column=1).value = name
active_ws.cell(row=max_row+count, column=2).value = name
count += 1
Edit: The max rows var can be adjusted to any column you want checked. (This one checks column "A")… The column cannot have an empty cells until your data is done otherwise, it will give you an incorrect "max rows".
Please notes that Testing Append Process.xlsx
file has to be created before running this code.
from openpyxl import load_workbook
import pandas as pd
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
df = pd.DataFrame(data)
filename = "Testing Append Process.xlsx"
workbook = load_workbook(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl')
writer.book = workbook
writer.sheets = {ws.title: ws for ws in workbook.worksheets}
df.to_excel(writer, startrow=writer.sheets['Sheet1'].max_row, index = False, header= False)
writer.close()
Returns the following if you will run the code twice.
The alternative solution above now gives a FutureWarning message for writer.book attribute. So here is the new solution.
from openpyxl import load_workbook
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
filename = "Testing Append Process.xlsx"
workbook = load_workbook(filename)
worksheet = workbook.active
for record in list(zip(*data.values())):
worksheet.append(record)
workbook.save(filename)
Returns the following if you will run the code twice.