Unzipping directory structure with python

Question:

I have a zip file which contains the following directory structure:

dir1dir2dir3a
dir1dir2dir3b

I’m trying to unzip it and maintain the directory structure however I get the error:

IOError: [Errno 2] No such file or directory: 'C:\projects\testFolder\subdir\unzip.exe'

where testFolder is dir1 above and subdir is dir2.

Is there a quick way of unzipping the file and maintaining the directory structure?

Asked By: Flyer1

||

Answers:

There’s a very easy way if you’re using Python 2.6: the extractall method.

However, since the zipfile module is implemented completely in Python without any C extensions, you can probably copy it out of a 2.6 installation and use it with an older version of Python; you may find this easier than having to reimplement the functionality yourself. However, the function itself is quite short:

def extractall(self, path=None, members=None, pwd=None):
    """Extract all members from the archive to the current working
       directory. `path' specifies a different directory to extract to.
       `members' is optional and must be a subset of the list returned
       by namelist().
    """
    if members is None:
        members = self.namelist()

    for zipinfo in members:
        self.extract(zipinfo, path, pwd)
Answered By: Eli Courtwright

It sounds like you are trying to run unzip to extract the zip.

It would be better to use the python zipfile module, and therefore do the extraction in python.

import zipfile

def extract(zipfilepath, extractiondir):
    zip = zipfile.ZipFile(zipfilepath)
    zip.extractall(path=extractiondir)
Answered By: Douglas Leeder

The extract and extractall methods are great if you’re on Python 2.6. I have to use Python 2.5 for now, so I just need to create the directories if they don’t exist. You can get a listing of directories with the namelist() method. The directories will always end with a forward slash (even on Windows) e.g.,

import os, zipfile

z = zipfile.ZipFile('myfile.zip')
for f in z.namelist():
    if f.endswith('/'):
        os.makedirs(f)

You probably don’t want to do it exactly like that (i.e., you’d probably want to extract the contents of the zip file as you iterate over the namelist), but you get the idea.

Answered By: Jeff

I tried this out, and can reproduce it. The extractall method, as suggested by other answers, does not solve the problem. This seems like a bug in the zipfile module to me (perhaps Windows-only?), unless I’m misunderstanding how zipfiles are structured.

testa
testatestb
testatestbtest.log
> test.zip

>>> from zipfile import ZipFile
>>> zipTest = ZipFile("C:\...\test.zip")
>>> zipTest.extractall("C:\...\")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...zipfile.py", line 940, in extractall
  File "...zipfile.py", line 928, in extract
  File "...zipfile.py", line 965, in _extract_member
IOError: [Errno 2] No such file or directory: 'C:\...\testa\testb\test.log'

If I do a printdir(), I get this (first column):

>>> zipTest.printdir()
File Name
testa/testb/
testa/testb/test.log

If I try to extract just the first entry, like this:

>>> zipTest.extract("testa/testb/")
'C:\...\testa\testb'

On disk, this results in the creation of a folder testa, with a file testb inside. This is apparently the reason why the subsequent attempt to extract test.log fails; testatestb is a file, not a folder.

Edit #1: If you extract just the file, then it works:

>>> zipTest.extract("testa/testb/test.log")
'C:\...\testa\testb\test.log'

Edit #2: Jeff’s code is the way to go; iterate through namelist; if it’s a directory, create the directory. Otherwise, extract the file.

Answered By: DNS

Don’t trust extract() or extractall().

These methods blindly extract files to the paths given in their filenames. But ZIP filenames can be anything at all, including dangerous strings like “x/../../../etc/passwd”. Extract such files and you could have just compromised your entire server.

Maybe this should be considered a reportable security hole in Python’s zipfile module, but any number of zip-dearchivers have exhibited the exact same behaviour in the past. To unarchive a ZIP file with folder structure safely you need in-depth checking of each file path.

Answered By: bobince

Note that zip files can have entries for directories as well as files. When creating archives with the zip command, pass the -D option to disable adding directory entries explicitly to the archive. When Python 2.6’s ZipFile.extractall method runs across a directory entry, it seems to create a file in its place. Since archive entries aren’t necessarily in order, this causes ZipFile.extractall to fail quite often, as it tries to create a file in a subdirectory of a file. If you’ve got an archive that you want to use with the Python module, simply extract it and re-zip it with the -D option. Here’s a little snippet I’ve been using for a while to do exactly that:

P=`pwd` && 
Z=`mktemp -d -t zip` && 
pushd $Z && 
unzip $P/<busted>.zip && 
zip -r -D $P/<new>.zip . && 
popd && 
rm -rf $Z

Replace <busted>.zip and <new>.zip with real filenames relative to the current directory. Then just copy the whole thing and paste it into a command shell, and it will create a new archive that’s ready to rock with Python 2.6. There is a zip command that will remove these directory entries without unzipping but IIRC it behaved oddly in different shell environments or zip configurations.

Answered By: xdissent

I know it may be a little late to say this but Jeff is right.
It’s as simple as:

import os
from zipfile import ZipFile as zip

def extractAll(zipName):
    z = zip(zipName)
    for f in z.namelist():
        if f.endswith('/'):
            os.makedirs(f)
        else:
            z.extract(f)

if __name__ == '__main__':
    zipList = ['one.zip', 'two.zip', 'three.zip']
    for zip in zipList:
        extractAll(zipName)
Answered By: ki113d

Filter namelist to exclude the folders

All you have to do is filter out the namelist() entries ending with / and the problem is resolved:

  z.extractall(dest, filter(lambda f: not f.endswith('/'), z.namelist()))

nJoy!

Answered By: nickl-

If like me, you have to extract a complete zip archive with an older Python release (in my case, 2.4) here’s what I came up with (based on Jeff’s answer):

import zipfile
import os

def unzip(source_file_path, destination_dir):
    destination_dir += '/'
    z = zipfile.ZipFile(source_file_path, 'r')
    for file in z.namelist():
        outfile_path = destination_dir + file
        if file.endswith('/'):
            os.makedirs(outfile_path)
        else:
            outfile = open(outfile_path, 'wb')
            outfile.write(z.read(file))
            outfile.close()
    z.close()
Answered By: Apteryx
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.