Compressing directory using shutil.make_archive() while preserving directory structure
Question:
I’m trying to zip a directory called test_dicoms
to a zip file named test_dicoms.zip
using the following code:
shutil.make_archive('/home/code/test_dicoms', 'zip', '/home/code/test_dicoms')
The problem is that when I unzip it, all of the files that were in /test_dicoms/
are extracted to /home/code/
instead of the folder /test_dicoms/
and all of it’s contained files being extracted to /home/code/
. So /test_dicoms/
has a file called foo.txt
and after I zip and unzip foo.txt
‘s path is /home/code/foo.txt
as opposed to /home/code/test_dicoms/foo.txt
. How do I fix this? Also, some of the directories I’m working with are very large. Will I need to add anything to my code to make it ZIP64 or is the function smart enough to do that automatically?
Here’s what’s currently in the archive created:
[gwarner@jazz gwarner]$ unzip -l test_dicoms.zip
Archive: test_dicoms.zip
Length Date Time Name
--------- ---------- ----- ----
93324 09-17-2015 16:05 AAscout_b_000070
93332 09-17-2015 16:05 AAscout_b_000125
93332 09-17-2015 16:05 AAscout_b_000248
Answers:
Using the terms in the documentation, you have specified a root_dir, but not a base_dir. Try specifying the base_dir like so:
shutil.make_archive('/home/code/test_dicoms',
'zip',
'/home/code/',
'test_dicoms')
To answer your second question, it depends upon the version of Python you are using. Starting from Python 3.4, ZIP64 extensions will be availble by default. Prior to Python 3.4, make_archive
will not automatically create a file with ZIP64 extensions. If you are using an older version of Python and want ZIP64, you can invoke the underlying zipfile.ZipFile()
directly.
If you choose to use zipfile.ZipFile()
directly, bypassing shutil.make_archive()
, here is an example:
import zipfile
import os
d = '/home/code/test_dicoms'
os.chdir(os.path.dirname(d))
with zipfile.ZipFile(d + '.zip',
"w",
zipfile.ZIP_DEFLATED,
allowZip64=True) as zf:
for root, _, filenames in os.walk(os.path.basename(d)):
for name in filenames:
name = os.path.join(root, name)
name = os.path.normpath(name)
zf.write(name, name)
Reference:
I have written a wrapper function myself because shutil.make_archive
is too confusing to use.
And just the code..
import os, shutil
def make_archive(source, destination):
base = os.path.basename(destination)
name = base.split('.')[0]
format = base.split('.')[1]
archive_from = os.path.dirname(source)
archive_to = os.path.basename(source.strip(os.sep))
shutil.make_archive(name, format, archive_from, archive_to)
shutil.move('%s.%s'%(name,format), destination)
make_archive('/path/to/folder', '/path/to/folder.zip')
I think, I am able to improve seanbehan’s answer by removing the file moving:
def make_archive(source, destination):
base_name = '.'.join(destination.split('.')[:-1])
format = destination.split('.')[-1]
root_dir = os.path.dirname(source)
base_dir = os.path.basename(source.strip(os.sep))
shutil.make_archive(base_name, format, root_dir, base_dir)
There are basically 2 approaches to using shutil
: you may try to understand the logic behind it or you may just use an example. I couldn’t find an example here so I tried to create my own.
;TLDR. Run shutil.make_archive('dir1_arc', 'zip', root_dir='dir1')
or shutil.make_archive('dir1_arc', 'zip', base_dir='dir1')
or just shutil.make_archive('dir1_arc', 'zip', 'dir1')
from temp
.
Suppose you have ~/temp/dir1
:
temp $ tree dir1
dir1
├── dir11
│ ├── file11
│ ├── file12
│ └── file13
├── dir1_arc.zip
├── file1
├── file2
└── file3
How can you create an archive of dir1
? Set base_name='dir1_arc'
, format='zip'
. Well you have a lot of options:
cd
into dir1
and run shutil.make_archive(base_name=base_name, format=format)
; it will create an archive dir1_arc.zip
inside dir1
; the only problem you’ll get a strange behavior: inside your archive you’ll find file dir1_arc.zip
;
- from
temp
run shutil.make_archive(base_name=base_name, format=format, base_dir='dir1')
; you’ll get dir1_arc.zip
inside temp
that you can unzip into dir1
; root_dir
defaults to temp
;
- from
~
run shutil.make_archive(base_name=base_name, format=format, root_dir='temp', base_dir='dir1')
; you’ll again get your file but this time inside ~
directory;
- create another directory
temp2
in ~
and run inside it: shutil.make_archive(base_name=base_name, format=format, root_dir='../temp', base_dir='dir1')
; you’ll get your archive in this temp2
folder;
Can you run shutil
without specifying arguments? You can. Run from temp
shutil.make_archive('dir1_arc', 'zip', 'dir1')
. This is the same as run shutil.make_archive('dir1_arc', 'zip', root_dir='dir1')
. What can we say about base_dir
in this case? From documentation not so much. From source code we may see that:
if root_dir is not None:
os.chdir(root_dir)
if base_dir is None:
base_dir = os.curdir
So in our case base_dir
is dir1
. And we can keep asking questions.
I was having issues with path split on some paths with ‘.’ periods in them and i found having an optional format which defaults to ‘zip’ is handy and still allows you to override for other formats and is less error prone.
import os
import shutil
from shutil import make_archive
def make_archive(source, destination, format='zip'):
import os
import shutil
from shutil import make_archive
base, name = os.path.split(destination)
archive_from = os.path.dirname(source)
archive_to = os.path.basename(source.strip(os.sep))
print(f'Source: {source}nDestination: {destination}nArchive From: {archive_from}nArchive To: {archive_to}n')
shutil.make_archive(name, format, archive_from, archive_to)
shutil.move('%s.%s' % (name, format), destination)
make_archive('/path/to/folder', '/path/to/folder.zip')
Special thanks to seanbehan’s original answer or i would have been lost in the sauce much longer.
This solution builds off the responses from irudyak and seanbehan and uses Pathlib
. You need to pass source
and destination
as Path objects.
from pathlib import Path
import shutil
def make_archive(source, destination):
base_name = destination.parent / destination.stem
format = (destination.suffix).replace(".", "")
root_dir = source.parent
base_dir = source.name
shutil.make_archive(base_name, format, root_dir, base_dir)
This is a variation on @nick’s answer that uses pathlib
, type hinting and avoids shadowing builtins:
from pathlib import Path
import shutil
def make_archive(source: Path, destination: Path) -> None:
base_name = destination.parent / destination.stem
fmt = destination.suffix.replace(".", "")
root_dir = source.parent
base_dir = source.name
shutil.make_archive(str(base_name), fmt, root_dir, base_dir)
Usage:
make_archive(Path("/path/to/dir/"), Path("/path/to/output.zip"))
You can use Pathlib
and shutil
:
from pathlib import Path
import shutil
shutil.make_archive(
*dest_path.split('.'),
root_dir=Path(src_path).parent,
base_dir=Path(src_path).name)
)
src_path
is the path of the source directory.
dest_path
is the path of the destination archive to be created.
I’m trying to zip a directory called test_dicoms
to a zip file named test_dicoms.zip
using the following code:
shutil.make_archive('/home/code/test_dicoms', 'zip', '/home/code/test_dicoms')
The problem is that when I unzip it, all of the files that were in /test_dicoms/
are extracted to /home/code/
instead of the folder /test_dicoms/
and all of it’s contained files being extracted to /home/code/
. So /test_dicoms/
has a file called foo.txt
and after I zip and unzip foo.txt
‘s path is /home/code/foo.txt
as opposed to /home/code/test_dicoms/foo.txt
. How do I fix this? Also, some of the directories I’m working with are very large. Will I need to add anything to my code to make it ZIP64 or is the function smart enough to do that automatically?
Here’s what’s currently in the archive created:
[gwarner@jazz gwarner]$ unzip -l test_dicoms.zip
Archive: test_dicoms.zip
Length Date Time Name
--------- ---------- ----- ----
93324 09-17-2015 16:05 AAscout_b_000070
93332 09-17-2015 16:05 AAscout_b_000125
93332 09-17-2015 16:05 AAscout_b_000248
Using the terms in the documentation, you have specified a root_dir, but not a base_dir. Try specifying the base_dir like so:
shutil.make_archive('/home/code/test_dicoms',
'zip',
'/home/code/',
'test_dicoms')
To answer your second question, it depends upon the version of Python you are using. Starting from Python 3.4, ZIP64 extensions will be availble by default. Prior to Python 3.4, make_archive
will not automatically create a file with ZIP64 extensions. If you are using an older version of Python and want ZIP64, you can invoke the underlying zipfile.ZipFile()
directly.
If you choose to use zipfile.ZipFile()
directly, bypassing shutil.make_archive()
, here is an example:
import zipfile
import os
d = '/home/code/test_dicoms'
os.chdir(os.path.dirname(d))
with zipfile.ZipFile(d + '.zip',
"w",
zipfile.ZIP_DEFLATED,
allowZip64=True) as zf:
for root, _, filenames in os.walk(os.path.basename(d)):
for name in filenames:
name = os.path.join(root, name)
name = os.path.normpath(name)
zf.write(name, name)
Reference:
I have written a wrapper function myself because shutil.make_archive
is too confusing to use.
And just the code..
import os, shutil
def make_archive(source, destination):
base = os.path.basename(destination)
name = base.split('.')[0]
format = base.split('.')[1]
archive_from = os.path.dirname(source)
archive_to = os.path.basename(source.strip(os.sep))
shutil.make_archive(name, format, archive_from, archive_to)
shutil.move('%s.%s'%(name,format), destination)
make_archive('/path/to/folder', '/path/to/folder.zip')
I think, I am able to improve seanbehan’s answer by removing the file moving:
def make_archive(source, destination):
base_name = '.'.join(destination.split('.')[:-1])
format = destination.split('.')[-1]
root_dir = os.path.dirname(source)
base_dir = os.path.basename(source.strip(os.sep))
shutil.make_archive(base_name, format, root_dir, base_dir)
There are basically 2 approaches to using shutil
: you may try to understand the logic behind it or you may just use an example. I couldn’t find an example here so I tried to create my own.
;TLDR. Run shutil.make_archive('dir1_arc', 'zip', root_dir='dir1')
or shutil.make_archive('dir1_arc', 'zip', base_dir='dir1')
or just shutil.make_archive('dir1_arc', 'zip', 'dir1')
from temp
.
Suppose you have ~/temp/dir1
:
temp $ tree dir1
dir1
├── dir11
│ ├── file11
│ ├── file12
│ └── file13
├── dir1_arc.zip
├── file1
├── file2
└── file3
How can you create an archive of dir1
? Set base_name='dir1_arc'
, format='zip'
. Well you have a lot of options:
cd
intodir1
and runshutil.make_archive(base_name=base_name, format=format)
; it will create an archivedir1_arc.zip
insidedir1
; the only problem you’ll get a strange behavior: inside your archive you’ll find filedir1_arc.zip
;- from
temp
runshutil.make_archive(base_name=base_name, format=format, base_dir='dir1')
; you’ll getdir1_arc.zip
insidetemp
that you can unzip intodir1
;root_dir
defaults totemp
; - from
~
runshutil.make_archive(base_name=base_name, format=format, root_dir='temp', base_dir='dir1')
; you’ll again get your file but this time inside~
directory; - create another directory
temp2
in~
and run inside it:shutil.make_archive(base_name=base_name, format=format, root_dir='../temp', base_dir='dir1')
; you’ll get your archive in thistemp2
folder;
Can you run shutil
without specifying arguments? You can. Run from temp
shutil.make_archive('dir1_arc', 'zip', 'dir1')
. This is the same as run shutil.make_archive('dir1_arc', 'zip', root_dir='dir1')
. What can we say about base_dir
in this case? From documentation not so much. From source code we may see that:
if root_dir is not None:
os.chdir(root_dir)
if base_dir is None:
base_dir = os.curdir
So in our case base_dir
is dir1
. And we can keep asking questions.
I was having issues with path split on some paths with ‘.’ periods in them and i found having an optional format which defaults to ‘zip’ is handy and still allows you to override for other formats and is less error prone.
import os
import shutil
from shutil import make_archive
def make_archive(source, destination, format='zip'):
import os
import shutil
from shutil import make_archive
base, name = os.path.split(destination)
archive_from = os.path.dirname(source)
archive_to = os.path.basename(source.strip(os.sep))
print(f'Source: {source}nDestination: {destination}nArchive From: {archive_from}nArchive To: {archive_to}n')
shutil.make_archive(name, format, archive_from, archive_to)
shutil.move('%s.%s' % (name, format), destination)
make_archive('/path/to/folder', '/path/to/folder.zip')
Special thanks to seanbehan’s original answer or i would have been lost in the sauce much longer.
This solution builds off the responses from irudyak and seanbehan and uses Pathlib
. You need to pass source
and destination
as Path objects.
from pathlib import Path
import shutil
def make_archive(source, destination):
base_name = destination.parent / destination.stem
format = (destination.suffix).replace(".", "")
root_dir = source.parent
base_dir = source.name
shutil.make_archive(base_name, format, root_dir, base_dir)
This is a variation on @nick’s answer that uses pathlib
, type hinting and avoids shadowing builtins:
from pathlib import Path
import shutil
def make_archive(source: Path, destination: Path) -> None:
base_name = destination.parent / destination.stem
fmt = destination.suffix.replace(".", "")
root_dir = source.parent
base_dir = source.name
shutil.make_archive(str(base_name), fmt, root_dir, base_dir)
Usage:
make_archive(Path("/path/to/dir/"), Path("/path/to/output.zip"))
You can use Pathlib
and shutil
:
from pathlib import Path
import shutil
shutil.make_archive(
*dest_path.split('.'),
root_dir=Path(src_path).parent,
base_dir=Path(src_path).name)
)
src_path
is the path of the source directory.dest_path
is the path of the destination archive to be created.