Python pandas write resulted dataframe to xlsm without losing the macro

Question:

I have a lot of excel files that I need to compile into a single excel file, and then copy the compiled one into an existing excel file (with macro / .xlsm) in a certain sheet.

I solved the first problem (compiling multiple excel files into a single excel file). The resulted dataframe is saved in .csv format. The resulted file looks like this.
the resulted dataframe

Until here there is no issue. The next step that I am struggling to find out how to do it.

From the resulted dataframe I want to “copy-and-paste” the dataframe to the existing excel file with macro (.xlsm) in the sheet “Source” at the corresponding headers. The existing excel file looks like this.
target excel file

As you may see from the picture above, I want to skip writing any data in the column A since the cells within this column is full of formula. I want to write the resulted dataframe in the column B to column Q within the existing excel file. However, before writing the data, I want delete all the existing data in all cells (except in the cells within column A).

So basically I want to do the following:

  1. delete all the values in the cells in the column B to column Q in
    the existing xlsm file (in the sheet “Source”)
  2. write the new values from the resulted dataframe to column B until column Q
  3. save the excel file back with the same name without losing the macro

Any feedback will be much appreciated! Thanks!

Regards,

Arnold

Asked By: arnold

||

Answers:

Since your csv import into spreadsheet can be handled with an Excel VBA macro using QueryTables consider having Python replicate VBA with a COM interface to the Excel Object Library. All previous macro code remains intact as nothing is overwritten but cell data. Note: Below assumes you are using Excel for Windows.

Using the win32com library, Python can replicate almost anything VBA does. In fact, you will come to know VBA is an add-on reference in Office applications and never a native, built-in object and does the same COM interfacing! See first selected item in ToolsReferences in your IDE.

import pandas as pd
import win32com.client as win32

# ...same pandas code...    
macrofile = "C:\Path\To\Macro\Workbook.xlsm"
strfile = "C:\Path\To\CSV\Output.csv"
df.to_csv(strfile)

try:
    xl = win32.gencache.EnsureDispatch('Excel.Application')
    wb = xl.Workbooks.Open(macrofile)

    # DELETE PREVIOUS DATA
    wb.Sheets("Source").Range("B:Q").EntireColumn.Delete()

    # ADD QUERYTABLE (SPECIFYING DESTINATION CELL START)
    qt = wb.Sheets("Source").QueryTables.Add(Connection="TEXT;" + strfile, 
                                             Destination=wb.Sheets(1).Cells(2, 2))
    qt.TextFileParseType = 1
    qt.TextFileConsecutiveDelimiter = False
    qt.TextFileTabDelimiter = False
    qt.TextFileSemicolonDelimiter = False
    qt.TextFileCommaDelimiter = True
    qt.TextFileSpaceDelimiter = False
    qt.Refresh(BackgroundQuery=False)

    # REMOVE QUERYTABLE
    for qt in wb.Sheets("Source").QueryTables:
        qt.Delete()

    # CLOSES WORKBOOK AND SAVES CHANGES
    wb.Close(True)

except Exception as e:
    print(e)

finally:    
    qt = None
    wb = None
    xl = None

Alternatively, create a new macro in VBA (placed in a standalone module) and have Python call it, passing the csv file path as parameter:

VBA

Public Sub ImportCSV(strfile As String)
    Dim qt As QueryTable

    ThisWorkbook.Sheets("Source").Range("B:Q").EntireColumn.Delete

    ' ADD QUERYTABLE
    With ThisWorkbook.Sheets("Source").QueryTables.Add(Connection:="TEXT;" & strfile, _
        Destination:=ThisWorkbook.Sheets(1).Cells(2, 2))
            .TextFileParseType = xlDelimited
            .TextFileConsecutiveDelimiter = False
            .TextFileTabDelimiter = False
            .TextFileSemicolonDelimiter = False
            .TextFileCommaDelimiter = True
            .TextFileSpaceDelimiter = False

            .Refresh BackgroundQuery:=False
    End With

    ' REMOVE QUERYTABLE
    For Each qt In ThisWorkbook.Sheets(1).QueryTables
        qt.Delete
    Next qt

    Set qt = Nothing
End Sub

Python

import pandas as pd
import win32com.client as win32

# ...same pandas code...    
macrofile = "C:\Path\To\Macro\Workbook.xlsm"
strfile = "C:\Path\To\CSV\Output.csv"
df.to_csv(strfile)

try:
    xl = win32.gencache.EnsureDispatch('Excel.Application')

    wb = xl.Workbooks.Open(macrofile)
    xl.Application.Run('ImportCSV', strfile)

    wb.Close(True)
    xl.Quit

except Exception as e:
    print(e)

finally:    
    wb = None
    xl = None
Answered By: Parfait

Sorry a bit late to come back updating my question. Finally I have solved my problem with openpyxl package.

So here is my final code:

import openpyxl
import os
import string
import pandas as pd
import numpy as np

path = #folder directory
target_file = #excel filename
sheetname = #working sheet that you wish to work on with

filename = os.path.join(path, target_file)

wb = openpyxl.load_workbook(filename, keep_vba=True)
sheet = wb.get_sheet_by_name(sheetname)

# To Erase All Values within Selected Columns
d = dict()
for x, y in zip(range(1, 27), string.ascii_lowercase):
    d[x] = y.upper()

max_row = sheet.max_row
max_col = sheet.max_column

for row in range(max_row):
    row += 1
    if row == 1: continue
    for col in range(max_col):
        col += 1
        if col == 1: continue
        sheet['{}{}'.format(d[col], row)] = None

# To Write Values to the Blank Worksheet
path_dataframe = # folder directory to the csv file
target_compiled = # csv filename
filename_compiled = os.path.join(path_compiled, target_compiled)

compiled = pd.read_csv(filename_compiled, low_memory=False, encoding = "ISO-8859-1")

for row in range(len(compiled.index)):
    row += 1
    if row == 1: continue # I do not want to change the value in row 1 in excel file because they are headers
    for col in range(max_col): 
        col += 1
        if col == 1: continue # I do not want to change the values in column 1 in excel file since they contain formula
        value = compiled.iloc[row-2][col-2]
        if type(value) is str: value = value
        elif type(value) is np.float64: value = float(value)
        elif type(value) is np.int64: value = int(value)
        sheet['{}{}'.format(d[col], row)] = value

wb.save(filename)
Answered By: arnold

I found the follwing solution based on openpyxl. What I learned is that xlsxwriter is not able to open existing excel files. Therefore, my approach is based on openpyxl.

import pandas as pd 
import openpyxl # one excel reader/writer compatible with pandas

book = openpyxl.load_workbook('input.xlsm', keep_vba = True) # Load existing .xlsm file

with pd.ExcelWriter('output.xlsm', engine='openpyxl') as writer: # open a writer instance with the filename of the 
    
    writer.book = book # Hand over input workbook
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # Hand over worksheets
    writer.vba_archive = book.vba_archive # Hand over VBA information 


    df_write.to_excel(writer, sheet_name = 'Sheet1', columns = ['A'],
                  header = False, index = False,
                  startrow = 1, startcol = 0)
    # Writes a column A of the Dataframe into the excel sheet 'Sheet1', which can 
    # already be present in input.xlsm, to row 1, col 0

    
    writer.save()
Answered By: janosch

As an alternative you can use xlwings

import pandas as pd
import xlwings

df = pd.DataFrame({
  "A": [420, 380, 390],
  "B": [50, 40, 45],
  "C": [60, 50, 20]
})


with xlwings.App(visible=False) as app:
    wb = xlwings.Book('Test.xlsm')    
    ws = wb.sheets('MySheet')
    ws.range('B1:D50').delete(shift='up') #if you want to delete the values, if not they will be overwritten
    ws.range('B1:D1').options(header=False, index=False).value = df[['A','B','C' ]]
    wb.save()
    wb.close()
Answered By: beltalowda
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.