Overwriting excel columns while keeping format using pandas
Question:
I’m working with an xlsx-file which looks like this:
My previous task was to modify the columns named ‘Entry 1’ and ‘Entry 2’. I have stored those columns in a seperate slice of the original dataframe for better overview. I’ll give you a quick glimpse how this slice looks:
>>> slice = df.loc[:, 'Entry 1':'Entry 2']
# code to modify the values
>>> slice
Entry 1 Entry 2
1 Modified 1 Value 1
2 Modified 2 Value 2
3 Modified 3 Value 3
I now want to overwrite those columns in the original dataframe with the named slice. I already achieved this by using the following:
df.loc[:, 'Entry1':'Entry2'] = slice
Question
As you can see, the header of the columns has a special format. How do I overwrite the values in ‘Entry1’ and ‘Entry2’, excluding the header, to keep the format?
Answers:
Full disclosure: I’m the author of the suggested library
Unfortunately there is no out-of-the-box way in pandas
to achieve that as it does not load the styling data. You can use StyleFrame
(that wraps pandas
and openpyxl
, which I assume you already have installed) that can read xlsx files while keeping (most) of the styling elements.
Using it in this case may look like the following:
from StyleFrame import StyleFrame
sf = StyleFrame.read_excel('test.xlsx', read_style=True)
# currently you have to specify each value manually,
# using slices will revert to the default style used by StyleFrame
sf.loc[0, 'Entry 1'].value = 'Modified 1'
sf.loc[1, 'Entry 1'].value = 'Modified 2'
sf.loc[2, 'Entry 1'].value = 'Modified 3'
sf.to_excel('test.xlsx').save()
Another alternative using a loop:
sf = StyleFrame.read_excel('test.xlsx', read_style=True, use_openpyxl_styles=False)
new_values = ['Modified 1', 'Modified 2', 'Modified 3']
for cell, new_value in zip(sf['Entry 1'], new_values):
cell.value = new_value
sf.to_excel('test.xlsx').save()
Content of test.xlsx
before execution:
and after:
Final answer
To give probs to a way more extensive solution which will fit to many passengers dropping by, check this.
But for me, this easy way was enough to fit my needs. All you need to do is write back to the original file, just start by “row 1” (since the first row is marked as “row 0”) as well as letting out the header and the indexing. In my case, you achieve this by the following:
# It is also possible to write the dataframe without the header and index.
df4.to_excel(writer, sheet_name='Sheet1',
startrow=1, startcol=2, header=False, index=False)
You can do this using df.to_clipboard(index=False)
from win32com.client import Dispatch
import pandas as pd
xlApp = Dispatch("Excel.Application")
xlApp.Visible = 1
xlApp.Workbooks.Open(r'c:Chadeetest.xlsx')
xlApp.ActiveSheet.Cells(1,1).Select
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.to_clipboard(index=False)
xlApp.ActiveWorkbook.ActiveSheet.PasteSpecial()
Output:
Note that the cell colors are still the same
Hope that helps! 🙂
I know this is more than you need, but in case others were looking for an answer to keeping formatting; as of Pandas 1.4 there is the addition of if_sheet_exists='overlay'
Original Spreadsheet:
import pandas as pd
df = pd.DataFrame({'Entry1': ['Modified 1', 'Modified 2 ', 'Modified 3'],
'Entry2': ['Value 1', 'Value 2','Value 2']})
with pd.ExcelWriter('Original_File.xlsx', engine='openpyxl'
mode='a', if_sheet_exists='overlay') as writer:
df.to_excel(writer, sheet_name='SheetName', startrow=1,
startcol=2, header=False, index=False)
And one can see that this also works if there is formatting in the cell.
So, I want to answer this with my workaround pre-pandas 1.4 because I found this page when trying to solve this problem.
I’m working in Pandas 1.3.4.
This is not the most elegant or fast solution, but it got the job done for me.
import openpyxl
import pandas as pd
with open(filePath,'rb') as fid:
DataFrame = pd.read_excel(fid,"sheetName")
dataWorkbook = openpyxl.load_workbook(filePath)
dataSheet = dataWorkbook["sheetName"]
--> Logic for editing data here
#Iterate over dataframe to write to the format in openpyxl
for col, header in enumerate(DataFrame):
for row in range(len(DataFrame)):
cellRef = dataSheet.cell(row=row+2,column=col+1) #2: OpenPyXl does not track headers internally 1:Indexing starts at 1 in excel
cellRef.value = DataFrame.loc[row,header]
dataWorkbook.save(filePath)
Disclaimer: I began learning Python in Late August of this year.
I’m working with an xlsx-file which looks like this:
My previous task was to modify the columns named ‘Entry 1’ and ‘Entry 2’. I have stored those columns in a seperate slice of the original dataframe for better overview. I’ll give you a quick glimpse how this slice looks:
>>> slice = df.loc[:, 'Entry 1':'Entry 2']
# code to modify the values
>>> slice
Entry 1 Entry 2
1 Modified 1 Value 1
2 Modified 2 Value 2
3 Modified 3 Value 3
I now want to overwrite those columns in the original dataframe with the named slice. I already achieved this by using the following:
df.loc[:, 'Entry1':'Entry2'] = slice
Question
As you can see, the header of the columns has a special format. How do I overwrite the values in ‘Entry1’ and ‘Entry2’, excluding the header, to keep the format?
Full disclosure: I’m the author of the suggested library
Unfortunately there is no out-of-the-box way in pandas
to achieve that as it does not load the styling data. You can use StyleFrame
(that wraps pandas
and openpyxl
, which I assume you already have installed) that can read xlsx files while keeping (most) of the styling elements.
Using it in this case may look like the following:
from StyleFrame import StyleFrame
sf = StyleFrame.read_excel('test.xlsx', read_style=True)
# currently you have to specify each value manually,
# using slices will revert to the default style used by StyleFrame
sf.loc[0, 'Entry 1'].value = 'Modified 1'
sf.loc[1, 'Entry 1'].value = 'Modified 2'
sf.loc[2, 'Entry 1'].value = 'Modified 3'
sf.to_excel('test.xlsx').save()
Another alternative using a loop:
sf = StyleFrame.read_excel('test.xlsx', read_style=True, use_openpyxl_styles=False)
new_values = ['Modified 1', 'Modified 2', 'Modified 3']
for cell, new_value in zip(sf['Entry 1'], new_values):
cell.value = new_value
sf.to_excel('test.xlsx').save()
Content of test.xlsx
before execution:
and after:
Final answer
To give probs to a way more extensive solution which will fit to many passengers dropping by, check this.
But for me, this easy way was enough to fit my needs. All you need to do is write back to the original file, just start by “row 1” (since the first row is marked as “row 0”) as well as letting out the header and the indexing. In my case, you achieve this by the following:
# It is also possible to write the dataframe without the header and index.
df4.to_excel(writer, sheet_name='Sheet1',
startrow=1, startcol=2, header=False, index=False)
You can do this using df.to_clipboard(index=False)
from win32com.client import Dispatch
import pandas as pd
xlApp = Dispatch("Excel.Application")
xlApp.Visible = 1
xlApp.Workbooks.Open(r'c:Chadeetest.xlsx')
xlApp.ActiveSheet.Cells(1,1).Select
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.to_clipboard(index=False)
xlApp.ActiveWorkbook.ActiveSheet.PasteSpecial()
Output:
Note that the cell colors are still the same
Hope that helps! 🙂
I know this is more than you need, but in case others were looking for an answer to keeping formatting; as of Pandas 1.4 there is the addition of if_sheet_exists='overlay'
Original Spreadsheet:
import pandas as pd
df = pd.DataFrame({'Entry1': ['Modified 1', 'Modified 2 ', 'Modified 3'],
'Entry2': ['Value 1', 'Value 2','Value 2']})
with pd.ExcelWriter('Original_File.xlsx', engine='openpyxl'
mode='a', if_sheet_exists='overlay') as writer:
df.to_excel(writer, sheet_name='SheetName', startrow=1,
startcol=2, header=False, index=False)
And one can see that this also works if there is formatting in the cell.
So, I want to answer this with my workaround pre-pandas 1.4 because I found this page when trying to solve this problem.
I’m working in Pandas 1.3.4.
This is not the most elegant or fast solution, but it got the job done for me.
import openpyxl
import pandas as pd
with open(filePath,'rb') as fid:
DataFrame = pd.read_excel(fid,"sheetName")
dataWorkbook = openpyxl.load_workbook(filePath)
dataSheet = dataWorkbook["sheetName"]
--> Logic for editing data here
#Iterate over dataframe to write to the format in openpyxl
for col, header in enumerate(DataFrame):
for row in range(len(DataFrame)):
cellRef = dataSheet.cell(row=row+2,column=col+1) #2: OpenPyXl does not track headers internally 1:Indexing starts at 1 in excel
cellRef.value = DataFrame.loc[row,header]
dataWorkbook.save(filePath)
Disclaimer: I began learning Python in Late August of this year.