Problem with ZipFile. Creates an additional subfolder?

Question:

I am using this example to extract an archived folder:

import zipfile
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall()

The problem is that the files do not appear in the folder where the file.zip is located, but additional subfolders are created and the files are unpacked there.

I am using Visual Studio Code.

Asked By: Ryotaro

||

Answers:

with zipfile.ZipFile(path_to_zip_file, 'r')

If you choose "r" this does only read your file, but as you want to create a new folder, you must choose "x". Please tell us, if that works for you with marking it as correct or comment, if that wasn’t the right answer. We’re willing to help you.

Also, see https://docs.python.org/3/library/zipfile.html#zipfile-objects for more information.

Answered By: Emanuel Schärer

You want to flatten your ZIP file upon extraction (no directories, just the leaf files).

By default, zipfile.ZipFile.extractall will write the contents to whatever the current working directory is, and keep the original structure of whatever was put into the archive.

Instead, what we need to do here is examine the ZipFile itself and:

  1. Find files that are actually files and not directories
  2. Modify their metadata so they don’t have full paths if they were originally stored under directories
  3. Use ZipFile.extract to write files out to a different directory

Let’s create a zipfile with some nested directories and files inside them, in my working directory called ziptest.

pwd
# /Users/wkl/Downloads/ziptest
mkdir -p Data/t1/t2/t3
touch Data/t1/file1.txt
touch Data/t1/t2/file.txt
touch Data/t1/t2/t3/file3.txt

cd Data
zip files.zip -r t1

#  adding: t1/ (stored 0%)
#  adding: t1/file1.txt (stored 0%)
#  adding: t1/t2/ (stored 0%)
#  adding: t1/t2/file.txt (stored 0%)
#  adding: t1/t2/t3/ (stored 0%)
#  adding: t1/t2/t3/file3.txt (stored 0%)

rm -rf t1

After this, I just have a files.zip archive that contains the entire t1 tree. Now, from my working directory, I have this script called ziptest.py which just extracts things:

#!/usr/bin/env python3

from pathlib import Path, PurePath
from zipfile import ZipFile

if __name__ == "__main__":
    data_dir = Path.cwd() / "Data"
    archive = data_dir / "files.zip"
    with ZipFile(archive) as zfile:
        # this flattens files by changing what
        # their output file would be
        for info in zfile.infolist():
            # skip any directories
            if not info.is_dir():
                # manipulate the zip info so that
                # we just have the filename, and no directories
                info.filename = PurePath(info.filename).name

                print(f"Extracting {info.filename} to {data_dir}")

                # extract lets you specify a path
                # to write the file to
                zfile.extract(info, data_dir)

If I run this script, I get this output:

> python3 ziptest.py
Extracting file1.txt to /Users/wkl/Downloads/ziptest/Data
Extracting file.txt to /Users/wkl/Downloads/ziptest/Data
Extracting file3.txt to /Users/wkl/Downloads/ziptest/Data

> ls Data
file.txt  file1.txt  file3.txt  files.zip

Note that if you have multiple files with the same name in the zipfile, this does not account for that.

Answered By: wkl
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.