How to add package data recursively in Python setup.py?

Question:

I have a new library that has to include a lot of subfolders of small datafiles, and I’m trying to add them as package data. Imagine I have my library as so:

 library
    - foo.py
    - bar.py
 data
   subfolderA
      subfolderA1
      subfolderA2
   subfolderB
      subfolderB1 
      ...

I want to add all of the data in all of the subfolders through setup.py, but it seems like I manually have to go into every single subfolder (there are 100 or so) and add an __init__.py file. Furthermore, will setup.py find these files recursively, or do I need to manually add all of these in setup.py like:

package_data={
  'mypackage.data.folderA': ['*'],
  'mypackage.data.folderA.subfolderA1': ['*'],
  'mypackage.data.folderA.subfolderA2': ['*']
   },

I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?

PS, the hierarchy of these folders is important because this is a database of material files and we want the file tree to be preserved when we present them in a GUI to the user, so it would be to our advantage to keep this file structure intact.

Answers:

  1. Use Setuptools instead of distutils.
  2. Use data files instead of package data. These do not require __init__.py.
  3. Generate the lists of files and directories using standard Python code, instead of writing it literally:

    data_files = []
    directories = glob.glob('data/subfolder?/subfolder??/')
    for directory in directories:
        files = glob.glob(directory+'*')
        data_files.append((directory, files))
    # then pass data_files to setup()
    
Answered By: Kevin

If you don’t have any problem with getting your setup.py code dirty use distutils.dir_util.copy_tree.
The whole problem is how to exclude files from it.
Heres some the code:

import os.path
from distutils import dir_util
from distutils import sysconfig
from distutils.core import setup

__packagename__ = 'x' 
setup(
    name = __packagename__,
    packages = [__packagename__],
)

destination_path = sysconfig.get_python_lib()
package_path = os.path.join(destination_path, __packagename__)

dir_util.copy_tree(__packagename__, package_path, update=1, preserve_mode=0)

Some Notes:

  • This code recursively copy the source code into the destination path.
  • You can just use the same setup(...) but use copy_tree() to extend the directory you want into the path of installation.
  • The default paths of distutil installation can be found in it’s API.
  • More information about copy_tree() module of distutils can be found here.

  • Answered By: Heartagramir

    The problem with the glob answer is that it only does so much. I.e. it’s not fully recursive. The problem with the copy_tree answer is that the files that are copied will be left behind on an uninstall.

    The proper solution is a recursive one which will let you set the package_data parameter in the setup call.

    I’ve written this small method to do this:

    import os
    
    def package_files(directory):
        paths = []
        for (path, directories, filenames) in os.walk(directory):
            for filename in filenames:
                paths.append(os.path.join('..', path, filename))
        return paths
    
    extra_files = package_files('path_to/extra_files_dir')
    
    setup(
        ...
        packages = ['package_name'],
        package_data={'': extra_files},
        ....
    )
    

    You’ll notice that when you do a pip uninstall package_name, that you’ll see your additional files being listed (as tracked with the package).

    Answered By: Sandy Chapman

    I can suggest a little code to add data_files in setup():

    data_files = []
    
    start_point = os.path.join(__pkgname__, 'static')
    for root, dirs, files in os.walk(start_point):
        root_files = [os.path.join(root, i) for i in files]
        data_files.append((root, root_files))
    
    start_point = os.path.join(__pkgname__, 'templates')
    for root, dirs, files in os.walk(start_point):
        root_files = [os.path.join(root, i) for i in files]
        data_files.append((root, root_files))
    
    setup(
        name = __pkgname__,
        description = __description__,
        version = __version__,
        long_description = README,
        ...
        data_files = data_files,
    )
    
    Answered By: Stan

    Use glob to select all subfolders in your setup.py:

    ...
    packages=['your_package'],
    package_data={'your_package': ['data/**/*']},
    ...
    
    Answered By: gbonetti

    To add all the subfolders using package_data in setup.py:
    add the number of * entries based on you subdirectory structure

    package_data={
      'mypackage.data.folderA': ['*','*/*','*/*/*'],
    }
    
    Answered By: mahesh

    I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?

    Here is a reusable, simple way:

    Add the following function in your setup.py, and call it as per the Usage instructions. This is essentially the generic version of the accepted answer.

    def find_package_data(specs):
        """recursively find package data as per the folders given
    
        Usage:
            # in setup.py
            setup(...
                  include_package_data=True,
                  package_data=find_package_data({
                     'package': ('resources', 'static')
                  }))
    
        Args:
            specs (dict): package => list of folder names to include files from
    
        Returns:
            dict of list of file names
        """
        return {
            package: list(''.join(n.split('/', 1)[1:]) for n in
                          flatten(glob('{}/{}/**/*'.format(package, f), recursive=True) for f in folders))
            for package, folders in specs.items()}
    
    
    Answered By: miraculixx

    Update

    According to the change log setuptools now supports recursive globs, using **, in package_data (as of v62.3.0, released May 2022).

    Original answer

    @gbonetti’s answer, using a recursive glob pattern, i.e. **, would be perfect.

    However, as commented by @daniel-himmelstein, that does not work yet in setuptools package_data.

    So, for the time being, I like to use the following workaround, based on pathlib‘s Path.glob():

    def glob_fix(package_name, glob):
        # this assumes setup.py lives in the folder that contains the package
        package_path = Path(f'./{package_name}').resolve()
        return [str(path.relative_to(package_path)) 
                for path in package_path.glob(glob)]
    

    This returns a list of path strings relative to the package path, as required.

    Here’s one way to use this:

    setuptools.setup(
        ...
        package_data={'my_package': [*glob_fix('my_package', 'my_data_dir/**/*'), 
                                     'my_other_dir/some.file', ...], ...},
        ...
    )
    

    The glob_fix() can be removed as soon as setuptools supports ** in package_data.

    Answered By: djvg

    I’m going to throw my solution in here in case anyone is looking for a clean way to include their compiled sphinx docs as data_files.

    setup.py

    from setuptools import setup
    import pathlib
    import os
    
    here = pathlib.Path(__file__).parent.resolve()
    
    # Get documentation files from the docs/build/html directory
    documentation = [doc.relative_to(here) for doc in here.glob("docs/build/html/**/*") if pathlib.Path.is_file(doc)]
    data_docs = {}
    for doc in documentation:
        doc_path = os.path.join("your_top_data_dir", "docs")
        path_parts = doc.parts[3:-1]  # remove "docs/build/html", ignore filename
        if path_parts:
            doc_path = os.path.join(doc_path, *path_parts)
        # create all appropriate subfolders and append relative doc path
        data_docs.setdefault(doc_path, []).append(str(doc))
    
    setup(
        ...
        include_package_data=True,
        # <sys.prefix>/your_top_data_dir
        data_files=[("your_top_data_dir", ["data/test-credentials.json"]), *list(data_docs.items())]
    )
    

    With the above solution, once you install your package you’ll have all the compiled documentation available at os.path.join(sys.prefix, "your_top_data_dir", "docs"). So, if you wanted to serve the now-static docs using nginx you could add the following to your nginx file:

    location /docs {
        # handle static files directly, without forwarding to the application
        alias /www/your_app_name/venv/your_top_data_dir/docs;
        expires 30d;
    }
    

    Once you’ve done that, you should be able to visit {your-domain.com}/docs and see your Sphinx documentation.

    Answered By: CaffeinatedMike

    If you don’t want to add custom code to iterate through the directory contents, you can use pbr library, which extends setuptools. See here for documentation on how to use it to copy an entire directory, preserving the directory structure:

    https://docs.openstack.org/pbr/latest/user/using.html#files

    Answered By: okahilak

    You need to write a function to return all files and its paths , you can use the following

    def sherinfind():
        # Add all folders contain files or other sub directories 
        pathlist=['templates/','scripts/']
        data={}        
        for path in pathlist:
            for root,d_names,f_names in os.walk(path,topdown=True, onerror=None, followlinks=False):
                data[root]=list()
                for f in f_names:
                    data[root].append(os.path.join(root, f))                
        
        fn=[(k,v) for k,v in data.items()]    
        return fn
    

    Now change the data_files in setup() as follows,

    data_files=sherinfind()
    
    Answered By: sherin
    Categories: questions Tags: , ,
    Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
    at the top-right corner.