Python creates corrupt files of excel when I run my code. How can I save the files without corrupting the file?

Question:

I’m trying to write a python code that essentially compares two excel files in 4 specific spreadsheets named the same in both files. After comparing, it will replace the values if there are differences, otherwise it will do nothing and proceed onwards. The next step is to run a macro that is within the excel file. Once the loop is done, a new excel file is created and from there I must replace two spreadsheets with those in a new file called "reconciliation."

It sounds like simple python coding, however, I’m stuck at the first part because my python code is creating a file that is completely corrupted and not openable with the error message:
Excel displayed error

Here is my code that I’m working with.

import openpyxl as xl
import tkinter as tk
from tkinter import filedialog
import os

root = tk.Tk()
root.withdraw()

# open file dialogs to select the excel files
file_path1 = filedialog.askopenfilename(title='Select the first Excel file',
                                        filetypes=[('Excel files', '*.xlsm')])
file_path2 = filedialog.askopenfilename(title='Select the second Excel file',
                                        filetypes=[('Excel files', '*.xlsm')])

wb1 = xl.load_workbook(file_path1, keep_vba=True)
wb2 = xl.load_workbook(file_path2, keep_vba=True)

for sheet_name in wb1.sheetnames:
    sheet1 = wb1[sheet_name]
    sheet2 = wb2[sheet_name]
    # get the number of rows and columns in the worksheet
    max_row = max(sheet1.max_row, sheet2.max_row)
    max_col = max(sheet1.max_column, sheet2.max_column)
    # loop through each row in the worksheet
    for row in range(1, max_row+1):
        # loop through each cell in the row
        for col in range(1, max_col+1):
            # get the values of the corresponding cells in the two worksheets
            cell1 = sheet1.cell(row=row, column=col).value
            cell2 = sheet2.cell(row=row, column=col).value
            if cell1 != cell2:
                # if the values are different, ask the user if they want to replace the sheet
                answer = input(f"Do you want to replace the sheet '{sheet_name}' in the second file with the sheet from the first file? (Y/N) ")
                if answer.lower() == 'y':
                    new_sheet = wb2.create_sheet(title=sheet_name)
                    for row in sheet1.iter_rows(values_only=True):
                        new_sheet.append(row)
                    new_sheet.title = sheet_name
                    del wb2[sheet_name]
                # break out of the loops, since a difference has already been found
                break
        else:
            # continue to the next row
            continue
        # break out of the loops, since we already found a difference
        break

file_name, file_ext = os.path.splitext(file_path1)
new_file_path = file_name + "_updated.xlsm"

new_workbook = xl.Workbook()

for sheet_name in wb1.sheetnames:
    sheet = wb1[sheet_name]
    new_sheet = new_workbook.create_sheet(sheet_name)
    for row in sheet.iter_rows():
        new_row = []
        for cell in row:
            new_row.append(cell.value)
        new_sheet.append(new_row)

new_workbook.save(new_file_path)

# Run the Macro2 in the updated Excel file
# xl = win32.Dispatch("Excel.Application")
# xl.Visible = True
# xl.Workbooks.Open(new_file_path)
# xl.Application.Run("Macro2")
# xl.ActiveWorkbook.Close(SaveChanges=True)
# xl.Quit()

#if input == "y":
#    print("'{sheet_name}' values replaced and new file saved as {}".format(sheet_name, new_file_path))
#else:
#    print("No changes were made")

I’ve tried numerous workarounds such as not creating a new file and having the code overwrite the original files. However it seems that my code of comparing and replacing is messing up with the excel files. It will also corrupt the original files.
I would appreciate any pointers. I’ve searched online and on stackoverflow for similar issues but their solutions do not work for mine.

Asked By: Carey

||

Answers:

Don’t you want to delete the old sheet before you create the new?

Instead of this:

                new_sheet = wb2.create_sheet(title=sheet_name)
                for row in sheet1.iter_rows(values_only=True):
                    new_sheet.append(row)
                new_sheet.title = sheet_name
                del wb2[sheet_name]

Shouldn’t you have this:

                del wb2[sheet_name]                    
                new_sheet = wb2.create_sheet(title=sheet_name)
                for row in sheet1.iter_rows(values_only=True):
                    new_sheet.append(row)
                new_sheet.title = sheet_name

Assuming that doesn’t fix your problem, have you tried writing the new sheet to an EXTRA sheet, for testing purposes?

                new_sheet = wb2.create_sheet(title="TEST "&sheet_name)
                for row in sheet1.iter_rows(values_only=True):
                    new_sheet.append(row)
                new_sheet.title ="TEST "& sheet_name

Why even do you need to write new_sheet.title if you have prespecified the sheet name when you created it?

Answered By: Eureka

Instead of the original block of code to save a new file, this alternative seems to fix the issue:

# create and save an updated file 
new_file_path = file_path1 + "_updated.xlsm"
shutil.copyfile(file_path1, new_file_path)

Make sure to import shutil first

Answered By: Carey
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.