"BadZipFile: File is not a zip file" – Error popped up all of a sudden
Question:
One minute my script works multiple days in a row, next minute I get this error.
File "<ipython-input-196-abdb28a77366>", line 1, in <module>
runfile('F:/-/-/-/cleaner_games_appstore_babil.py', wdir='F:/-/-/-')
File "C:ProgramDataAnaconda3libsite-packagesspyder_kernelscustomizespydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:ProgramDataAnaconda3libsite-packagesspyder_kernelscustomizespydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "F:/-/-/-/cleaner_games_appstore_babil.py", line 112, in <module>
append_df_to_excel("stillfront.xlsx", dff, sheet_name='Apple_Babil', startrow=None, truncate_sheet=False, engine='openpyxl', header = False)
File "F:/-/-/-/cleaner_games_appstore_babil.py", line 84, in append_df_to_excel
writer.book = load_workbook(filename)
File "C:ProgramDataAnaconda3libsite-packagesopenpyxlreaderexcel.py", line 311, in load_workbook
data_only, keep_links)
File "C:ProgramDataAnaconda3libsite-packagesopenpyxlreaderexcel.py", line 126, in __init__
self.archive = _validate_archive(fn)
File "C:ProgramDataAnaconda3libsite-packagesopenpyxlreaderexcel.py", line 98, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:ProgramDataAnaconda3libzipfile.py", line 1222, in __init__
self._RealGetContents()
File "C:ProgramDataAnaconda3libzipfile.py", line 1289, in _RealGetContents
raise BadZipFile("File is not a zip file")
BadZipFile: File is not a zip file
To clarify I do not use any zip files. I found the code on here, StackOverflow, and there were not mentioning about the code not working, or error happening.
The script is supposed to write my pandas DataFrame to an excel sheet.
Here’s the part of the code that creates the error:
def append_df_to_excel(filename, df, sheet_name='Apple_Babil', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
append_df_to_excel("stillfront.xlsx", dff, sheet_name='Apple_Babil', startrow=None, truncate_sheet=False, engine='openpyxl', header = False)
Code was not edited or anything, just started not working.
Answers:
It is a very common issue and many people are trying to solve.It is related to excel file and openpyxl. Like @Barmar said in his comments xlsx, xlsm, etc are indeed zip. It was working fine until python 2.7 .
Try reading and writing to a csv instead, it won’t be a problem.
Excel XLSX
files are zipped, XLS
files are not.
I believe this bug is related to a combination of
XLS
is not zipped, and
- Since python-3.9, the
openpyxl
module must be used with XLSX
files.
This problem is easy to solve by checking which type of Excel file is uploaded and using the appropriate engine to read into Pandas
.
By file extension
from pathlib import Path
import pandas as pd
file_path = Path(filename)
file_extension = file_path.suffix.lower()[1:]
if file_extension == 'xlsx':
df = pd.read_excel(file.read(), engine='openpyxl')
elif file_extension == 'xls':
df = pd.read_excel(file.read())
elif file_extension == 'csv':
df = pd.read_csv(file.read())
else:
raise Exception("File not supported")
By file mimetype
If you happen to have access to the file mimetype, you can perform the following test:
import pandas as pd
if file.content_type == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
df = pd.read_excel(file.read(), engine='openpyxl') # XLSX
elif file.content_type == 'application/vnd.ms-excel':
df = pd.read_excel(file.read()) # XLS
elif file.content_type == 'text/csv':
df = pd.read_csv(file.read()) # CSV
else:
raise Exception("File not supported")
As others have already pointed out, a corrupted file is the culprit.
Perform these quick sanity checks:
- Open the excel file. Is the data appearing correctly?
- Are you able to see the file size in the file’s details in Windows Explorer?
In my case, I manually checked the excel file content and it turns out it was empty because I was not storing the file correctly. Once I fixed this, the "File is not a zip file" error got resolved.
I got the same error. It turned out that the file was opened in another program, which caused the error. Closing the other program solved it.
One minute my script works multiple days in a row, next minute I get this error.
File "<ipython-input-196-abdb28a77366>", line 1, in <module>
runfile('F:/-/-/-/cleaner_games_appstore_babil.py', wdir='F:/-/-/-')
File "C:ProgramDataAnaconda3libsite-packagesspyder_kernelscustomizespydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:ProgramDataAnaconda3libsite-packagesspyder_kernelscustomizespydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "F:/-/-/-/cleaner_games_appstore_babil.py", line 112, in <module>
append_df_to_excel("stillfront.xlsx", dff, sheet_name='Apple_Babil', startrow=None, truncate_sheet=False, engine='openpyxl', header = False)
File "F:/-/-/-/cleaner_games_appstore_babil.py", line 84, in append_df_to_excel
writer.book = load_workbook(filename)
File "C:ProgramDataAnaconda3libsite-packagesopenpyxlreaderexcel.py", line 311, in load_workbook
data_only, keep_links)
File "C:ProgramDataAnaconda3libsite-packagesopenpyxlreaderexcel.py", line 126, in __init__
self.archive = _validate_archive(fn)
File "C:ProgramDataAnaconda3libsite-packagesopenpyxlreaderexcel.py", line 98, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:ProgramDataAnaconda3libzipfile.py", line 1222, in __init__
self._RealGetContents()
File "C:ProgramDataAnaconda3libzipfile.py", line 1289, in _RealGetContents
raise BadZipFile("File is not a zip file")
BadZipFile: File is not a zip file
To clarify I do not use any zip files. I found the code on here, StackOverflow, and there were not mentioning about the code not working, or error happening.
The script is supposed to write my pandas DataFrame to an excel sheet.
Here’s the part of the code that creates the error:
def append_df_to_excel(filename, df, sheet_name='Apple_Babil', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
append_df_to_excel("stillfront.xlsx", dff, sheet_name='Apple_Babil', startrow=None, truncate_sheet=False, engine='openpyxl', header = False)
Code was not edited or anything, just started not working.
It is a very common issue and many people are trying to solve.It is related to excel file and openpyxl. Like @Barmar said in his comments xlsx, xlsm, etc are indeed zip. It was working fine until python 2.7 .
Try reading and writing to a csv instead, it won’t be a problem.
Excel XLSX
files are zipped, XLS
files are not.
I believe this bug is related to a combination of
XLS
is not zipped, and- Since python-3.9, the
openpyxl
module must be used withXLSX
files.
This problem is easy to solve by checking which type of Excel file is uploaded and using the appropriate engine to read into Pandas
.
By file extension
from pathlib import Path
import pandas as pd
file_path = Path(filename)
file_extension = file_path.suffix.lower()[1:]
if file_extension == 'xlsx':
df = pd.read_excel(file.read(), engine='openpyxl')
elif file_extension == 'xls':
df = pd.read_excel(file.read())
elif file_extension == 'csv':
df = pd.read_csv(file.read())
else:
raise Exception("File not supported")
By file mimetype
If you happen to have access to the file mimetype, you can perform the following test:
import pandas as pd
if file.content_type == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
df = pd.read_excel(file.read(), engine='openpyxl') # XLSX
elif file.content_type == 'application/vnd.ms-excel':
df = pd.read_excel(file.read()) # XLS
elif file.content_type == 'text/csv':
df = pd.read_csv(file.read()) # CSV
else:
raise Exception("File not supported")
As others have already pointed out, a corrupted file is the culprit.
Perform these quick sanity checks:
- Open the excel file. Is the data appearing correctly?
- Are you able to see the file size in the file’s details in Windows Explorer?
In my case, I manually checked the excel file content and it turns out it was empty because I was not storing the file correctly. Once I fixed this, the "File is not a zip file" error got resolved.
I got the same error. It turned out that the file was opened in another program, which caused the error. Closing the other program solved it.