Compressing directory using shutil.make_archive() while preserving directory structure

Question:

I’m trying to zip a directory called test_dicoms to a zip file named test_dicoms.zip using the following code:

shutil.make_archive('/home/code/test_dicoms', 'zip', '/home/code/test_dicoms')

The problem is that when I unzip it, all of the files that were in /test_dicoms/ are extracted to /home/code/ instead of the folder /test_dicoms/ and all of it’s contained files being extracted to /home/code/. So /test_dicoms/ has a file called foo.txt and after I zip and unzip foo.txt‘s path is /home/code/foo.txt as opposed to /home/code/test_dicoms/foo.txt. How do I fix this? Also, some of the directories I’m working with are very large. Will I need to add anything to my code to make it ZIP64 or is the function smart enough to do that automatically?

Here’s what’s currently in the archive created:

[gwarner@jazz gwarner]$ unzip -l test_dicoms.zip
Archive: test_dicoms.zip
Length    Date       Time  Name
--------- ---------- ----- ----
    93324 09-17-2015 16:05 AAscout_b_000070
    93332 09-17-2015 16:05 AAscout_b_000125
    93332 09-17-2015 16:05 AAscout_b_000248
Asked By: G Warner

||

Answers:

Using the terms in the documentation, you have specified a root_dir, but not a base_dir. Try specifying the base_dir like so:

shutil.make_archive('/home/code/test_dicoms',
                    'zip',
                    '/home/code/',
                    'test_dicoms')

To answer your second question, it depends upon the version of Python you are using. Starting from Python 3.4, ZIP64 extensions will be availble by default. Prior to Python 3.4, make_archive will not automatically create a file with ZIP64 extensions. If you are using an older version of Python and want ZIP64, you can invoke the underlying zipfile.ZipFile() directly.

If you choose to use zipfile.ZipFile() directly, bypassing shutil.make_archive(), here is an example:

import zipfile
import os

d = '/home/code/test_dicoms'

os.chdir(os.path.dirname(d))
with zipfile.ZipFile(d + '.zip',
                     "w",
                     zipfile.ZIP_DEFLATED,
                     allowZip64=True) as zf:
    for root, _, filenames in os.walk(os.path.basename(d)):
        for name in filenames:
            name = os.path.join(root, name)
            name = os.path.normpath(name)
            zf.write(name, name)

Reference:

Answered By: Robᵩ

I have written a wrapper function myself because shutil.make_archive is too confusing to use.

Here it is http://www.seanbehan.com/how-to-use-python-shutil-make_archive-to-zip-up-a-directory-recursively-including-the-root-folder/

And just the code..

import os, shutil
def make_archive(source, destination):
        base = os.path.basename(destination)
        name = base.split('.')[0]
        format = base.split('.')[1]
        archive_from = os.path.dirname(source)
        archive_to = os.path.basename(source.strip(os.sep))
        shutil.make_archive(name, format, archive_from, archive_to)
        shutil.move('%s.%s'%(name,format), destination)

make_archive('/path/to/folder', '/path/to/folder.zip')
Answered By: seanbehan

I think, I am able to improve seanbehan’s answer by removing the file moving:

def make_archive(source, destination):
    base_name = '.'.join(destination.split('.')[:-1])
    format = destination.split('.')[-1]
    root_dir = os.path.dirname(source)
    base_dir = os.path.basename(source.strip(os.sep))
    shutil.make_archive(base_name, format, root_dir, base_dir)
Answered By: Make42

There are basically 2 approaches to using shutil: you may try to understand the logic behind it or you may just use an example. I couldn’t find an example here so I tried to create my own.

;TLDR. Run shutil.make_archive('dir1_arc', 'zip', root_dir='dir1') or shutil.make_archive('dir1_arc', 'zip', base_dir='dir1') or just shutil.make_archive('dir1_arc', 'zip', 'dir1') from temp.

Suppose you have ~/temp/dir1:

temp $ tree dir1
dir1
├── dir11
│   ├── file11
│   ├── file12
│   └── file13
├── dir1_arc.zip
├── file1
├── file2
└── file3

How can you create an archive of dir1? Set base_name='dir1_arc', format='zip'. Well you have a lot of options:

  • cd into dir1 and run shutil.make_archive(base_name=base_name, format=format); it will create an archive dir1_arc.zip inside dir1; the only problem you’ll get a strange behavior: inside your archive you’ll find file dir1_arc.zip;
  • from temp run shutil.make_archive(base_name=base_name, format=format, base_dir='dir1'); you’ll get dir1_arc.zip inside temp that you can unzip into dir1; root_dir defaults to temp;
  • from ~ run shutil.make_archive(base_name=base_name, format=format, root_dir='temp', base_dir='dir1'); you’ll again get your file but this time inside ~ directory;
  • create another directory temp2 in ~ and run inside it: shutil.make_archive(base_name=base_name, format=format, root_dir='../temp', base_dir='dir1'); you’ll get your archive in this temp2 folder;

Can you run shutil without specifying arguments? You can. Run from temp shutil.make_archive('dir1_arc', 'zip', 'dir1'). This is the same as run shutil.make_archive('dir1_arc', 'zip', root_dir='dir1'). What can we say about base_dir in this case? From documentation not so much. From source code we may see that:

if root_dir is not None:
  os.chdir(root_dir)

if base_dir is None:
        base_dir = os.curdir 

So in our case base_dir is dir1. And we can keep asking questions.

Answered By: irudyak

I was having issues with path split on some paths with ‘.’ periods in them and i found having an optional format which defaults to ‘zip’ is handy and still allows you to override for other formats and is less error prone.

import os
import shutil
from shutil import make_archive

def make_archive(source, destination, format='zip'):
    import os
    import shutil
    from shutil import make_archive
    base, name = os.path.split(destination)
    archive_from = os.path.dirname(source)
    archive_to = os.path.basename(source.strip(os.sep))
    print(f'Source: {source}nDestination: {destination}nArchive From: {archive_from}nArchive To: {archive_to}n')
    shutil.make_archive(name, format, archive_from, archive_to)
    shutil.move('%s.%s' % (name, format), destination)

make_archive('/path/to/folder', '/path/to/folder.zip')

Special thanks to seanbehan’s original answer or i would have been lost in the sauce much longer.

Answered By: Mike R

This solution builds off the responses from irudyak and seanbehan and uses Pathlib. You need to pass source and destination as Path objects.

from pathlib import Path
import shutil

def make_archive(source, destination):
    base_name = destination.parent / destination.stem
    format = (destination.suffix).replace(".", "")
    root_dir = source.parent
    base_dir = source.name
    shutil.make_archive(base_name, format, root_dir, base_dir)
Answered By: Nick

This is a variation on @nick’s answer that uses pathlib, type hinting and avoids shadowing builtins:

from pathlib import Path
import shutil

def make_archive(source: Path, destination: Path) -> None:
    base_name = destination.parent / destination.stem
    fmt = destination.suffix.replace(".", "")
    root_dir = source.parent
    base_dir = source.name
    shutil.make_archive(str(base_name), fmt, root_dir, base_dir)

Usage:

make_archive(Path("/path/to/dir/"), Path("/path/to/output.zip"))
Answered By: phoenix

You can use Pathlib and shutil:

from pathlib import Path
import shutil
shutil.make_archive(
   *dest_path.split('.'), 
   root_dir=Path(src_path).parent, 
   base_dir=Path(src_path).name)
)
  • src_path is the path of the source directory.
  • dest_path is the path of the destination archive to be created.
Answered By: Georgios Douzas
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.