openpyxl load_workbook() on a legit .xlsx file leads to a zipfile.BadZipFile error

Question:

What I was trying to do is appening dataframe data to an existing legit excel file.
I used load_workbook() function from openpyxl, but it systematically returns an error.
Here is some code that crashes on my machine:

from openpyxl import load_workbook

report_path = root_folder + '\log_report.xlsx'
writer = pd.ExcelWriter(report_path, engine='openpyxl')
writer.book = load_workbook(report_path)
writer.close()

Here, log_report.xlsx already exists and have been created through pandas .to_excel().
Before opening with openpyxl load_workbook(), it is possible to open it, to edit it and do anything MS Excel allows.
I got the following error returned:

Traceback (most recent call last):
  File "D:/failsafe_counter/main.py", line 419, in <module>
    writer.book = load_workbook(report_path)
  File "D:failsafe_countervenvlibsite-packagesopenpyxlreaderexcel.py", line 315, in load_workbook
    reader = ExcelReader(filename, read_only, keep_vba,
  File "D:failsafe_countervenvlibsite-packagesopenpyxlreaderexcel.py", line 124, in __init__
    self.archive = _validate_archive(fn)
  File "D:failsafe_countervenvlibsite-packagesopenpyxlreaderexcel.py", line 96, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "C:UsersXXXXAppDataLocalProgramsPythonPython38-32libzipfile.py", line 1269, in __init__
    self._RealGetContents()
  File "C:UsersXXXXAppDataLocalProgramsPythonPython38-32libzipfile.py", line 1336, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Important aspect of that is this operation left the initial excel file corrupted, and impossible to open again.

Initial file is a legit zipfile (verified through renaming the xlsx into a .zip). Once the little code above returns the error, it turns the Excel file into an empty archive (verified through the same process).

I employed such functions with success on my previous laptop (under windows 7) but since I migrated under windows 10 I’m not able to use them anymore. Both of them were running python 3.8.

Is there any known issue about openpyxl load_workbook() on some configs?
Do you have any idea how to fix this, or any workaround?

Asked By: EGI

||

Answers:

The exception is quite clear: openpyxl cannot read the file because it is not a zipfile.

pd.ExcelWriter(report_path, engine='openpyxl') creates a new file but as this is a completely empty file, openpyxl cannot load it. If you want to work with a file in both Pandas an openpyxl, you have to create a "book" object.

wb = load_workbook(report_path)
writer = pd.ExcelWriter(report_path, engine='openpyxl')
writer.book = wb
Answered By: Charlie Clark

I was able to replicate the problem. It is pandas related. Everything works just fine up to pandas 1.1.5
In pandas 1.2.0 they did some changes

At the time when you instantiate pd.ExcelWriter with

writer = pd.ExcelWriter(report_path, engine='openpyxl')`

it creates empty file with size 0 bytes and overwrites the existing file and then you get error when try to load it. It is not openpyxl related, because with latest version of openpyxl it works fine with pandas 1.1.5.

The solution – specify mode='a', change the above line to

writer = pd.ExcelWriter(report_path, engine='openpyxl', mode='a')

Alternatively – look at @CharlieClark solution or this solution where they loads the file before instantiating the pd.ExcelWriter..

Answered By: buran

In my case that problem occour when file is internal in company domain (not public).

Answered By: user_mpr

Just check, if the file you are using in the code is opened in excel.
If it is opened, close the file and run the code again.

Answered By: Hemanth M
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.