How can files be added to a tarfile with Python, without adding the directory hierarchy?
Question:
When I invoke add()
on a tarfile
object with a file path, the file is added to the tarball with directory hierarchy associated. In other words, if I unzip the tarfile the directories in the original directories hierarchy are reproduced.
Is there a way to simply adding a plain file without directory info that untarring the resulting tarball produce a flat list of files?
Answers:
You can use tarfile.addfile()
, in the TarInfo
object, which is the first parameter, you can specify a name
that’s different from the file you’re adding.
This piece of code should add /path/to/filename
to the TAR file but will extract it as myfilename
:
tar.addfile(tarfile.TarInfo("myfilename.txt"), open("/path/to/filename.txt"))
Maybe you can use the “arcname” argument to TarFile.add(name, arcname). It takes an alternate name that the file will have inside the archive.
Using the arcname argument of TarFile.add() method is an alternate and convenient way to match your destination.
Example: you want to archive a dir repo/a.git/ to a tar.gz file, but you rather want the tree root in the archive begins by a.git/ but not repo/a.git/, you can do like followings:
archive = tarfile.open("a.git.tar.gz", "w|gz")
archive.add("repo/a.git", arcname="a.git")
archive.close()
thanks to @diabloneo, function to create selective tarball of a dir
def compress(output_file="archive.tar.gz", output_dir='', root_dir='.', items=[]):
"""compress dirs.
KWArgs
------
output_file : str, default ="archive.tar.gz"
output_dir : str, default = ''
absolute path to output
root_dir='.',
absolute path to input root dir
items : list
list of dirs/items relative to root dir
"""
os.chdir(root_dir)
with tarfile.open(os.path.join(output_dir, output_file), "w:gz") as tar:
for item in items:
tar.add(item, arcname=item)
>>>root_dir = "/abs/pth/to/dir/"
>>>compress(output_file="archive.tar.gz", output_dir=root_dir,
root_dir=root_dir, items=["logs", "output"])
If you want to add the directory name but not its contents inside a tarfile, you can do the following:
(1) create an empty directory called empty
(2) tf.add("empty", arcname=path_you_want_to_add)
That creates an empty directory with the name path_you_want_to_add
.
When I invoke add()
on a tarfile
object with a file path, the file is added to the tarball with directory hierarchy associated. In other words, if I unzip the tarfile the directories in the original directories hierarchy are reproduced.
Is there a way to simply adding a plain file without directory info that untarring the resulting tarball produce a flat list of files?
You can use tarfile.addfile()
, in the TarInfo
object, which is the first parameter, you can specify a name
that’s different from the file you’re adding.
This piece of code should add /path/to/filename
to the TAR file but will extract it as myfilename
:
tar.addfile(tarfile.TarInfo("myfilename.txt"), open("/path/to/filename.txt"))
Maybe you can use the “arcname” argument to TarFile.add(name, arcname). It takes an alternate name that the file will have inside the archive.
Using the arcname argument of TarFile.add() method is an alternate and convenient way to match your destination.
Example: you want to archive a dir repo/a.git/ to a tar.gz file, but you rather want the tree root in the archive begins by a.git/ but not repo/a.git/, you can do like followings:
archive = tarfile.open("a.git.tar.gz", "w|gz")
archive.add("repo/a.git", arcname="a.git")
archive.close()
thanks to @diabloneo, function to create selective tarball of a dir
def compress(output_file="archive.tar.gz", output_dir='', root_dir='.', items=[]):
"""compress dirs.
KWArgs
------
output_file : str, default ="archive.tar.gz"
output_dir : str, default = ''
absolute path to output
root_dir='.',
absolute path to input root dir
items : list
list of dirs/items relative to root dir
"""
os.chdir(root_dir)
with tarfile.open(os.path.join(output_dir, output_file), "w:gz") as tar:
for item in items:
tar.add(item, arcname=item)
>>>root_dir = "/abs/pth/to/dir/"
>>>compress(output_file="archive.tar.gz", output_dir=root_dir,
root_dir=root_dir, items=["logs", "output"])
If you want to add the directory name but not its contents inside a tarfile, you can do the following:
(1) create an empty directory called empty
(2) tf.add("empty", arcname=path_you_want_to_add)
That creates an empty directory with the name path_you_want_to_add
.