How can files be added to a tarfile with Python, without adding the directory hierarchy?

Question:

When I invoke add() on a tarfile object with a file path, the file is added to the tarball with directory hierarchy associated. In other words, if I unzip the tarfile the directories in the original directories hierarchy are reproduced.

Is there a way to simply adding a plain file without directory info that untarring the resulting tarball produce a flat list of files?

Asked By: theactiveactor

||

Answers:

You can use tarfile.addfile(), in the TarInfo object, which is the first parameter, you can specify a name that’s different from the file you’re adding.

This piece of code should add /path/to/filename to the TAR file but will extract it as myfilename:

tar.addfile(tarfile.TarInfo("myfilename.txt"), open("/path/to/filename.txt"))
Answered By: Wim

Maybe you can use the “arcname” argument to TarFile.add(name, arcname). It takes an alternate name that the file will have inside the archive.

Answered By: Lauro Moura

Using the arcname argument of TarFile.add() method is an alternate and convenient way to match your destination.

Example: you want to archive a dir repo/a.git/ to a tar.gz file, but you rather want the tree root in the archive begins by a.git/ but not repo/a.git/, you can do like followings:

archive = tarfile.open("a.git.tar.gz", "w|gz")
archive.add("repo/a.git", arcname="a.git")
archive.close()
Answered By: diabloneo

thanks to @diabloneo, function to create selective tarball of a dir

def compress(output_file="archive.tar.gz", output_dir='', root_dir='.', items=[]):
    """compress dirs.

    KWArgs
    ------
    output_file : str, default ="archive.tar.gz"
    output_dir : str, default = ''
        absolute path to output
    root_dir='.',
        absolute path to input root dir
    items : list
        list of dirs/items relative to root dir

    """
    os.chdir(root_dir)
    with tarfile.open(os.path.join(output_dir, output_file), "w:gz") as tar:
        for item in items:
            tar.add(item, arcname=item)    


>>>root_dir = "/abs/pth/to/dir/"
>>>compress(output_file="archive.tar.gz", output_dir=root_dir, 
            root_dir=root_dir, items=["logs", "output"])
Answered By: muon

If you want to add the directory name but not its contents inside a tarfile, you can do the following:

(1) create an empty directory called empty
(2) tf.add("empty", arcname=path_you_want_to_add)

That creates an empty directory with the name path_you_want_to_add.

Answered By: Steven R Brandt
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.