How to add package data recursively in Python setup.py?
Question:
I have a new library that has to include a lot of subfolders of small datafiles, and I’m trying to add them as package data. Imagine I have my library as so:
library
- foo.py
- bar.py
data
subfolderA
subfolderA1
subfolderA2
subfolderB
subfolderB1
...
I want to add all of the data in all of the subfolders through setup.py
, but it seems like I manually have to go into every single subfolder (there are 100 or so) and add an __init__.py
file. Furthermore, will setup.py
find these files recursively, or do I need to manually add all of these in setup.py
like:
package_data={
'mypackage.data.folderA': ['*'],
'mypackage.data.folderA.subfolderA1': ['*'],
'mypackage.data.folderA.subfolderA2': ['*']
},
I can do this with a script, but seems like a super pain. How can I achieve this in setup.py
?
PS, the hierarchy of these folders is important because this is a database of material files and we want the file tree to be preserved when we present them in a GUI to the user, so it would be to our advantage to keep this file structure intact.
Answers:
- Use Setuptools instead of distutils.
- Use data files instead of package data. These do not require
__init__.py
.
-
Generate the lists of files and directories using standard Python code, instead of writing it literally:
data_files = []
directories = glob.glob('data/subfolder?/subfolder??/')
for directory in directories:
files = glob.glob(directory+'*')
data_files.append((directory, files))
# then pass data_files to setup()
If you don’t have any problem with getting your setup.py code dirty use distutils.dir_util.copy_tree
.
The whole problem is how to exclude files from it.
Heres some the code:
import os.path
from distutils import dir_util
from distutils import sysconfig
from distutils.core import setup
__packagename__ = 'x'
setup(
name = __packagename__,
packages = [__packagename__],
)
destination_path = sysconfig.get_python_lib()
package_path = os.path.join(destination_path, __packagename__)
dir_util.copy_tree(__packagename__, package_path, update=1, preserve_mode=0)
Some Notes:
This code recursively copy the source code into the destination path.
You can just use the same setup(...)
but use copy_tree()
to extend the directory you want into the path of installation.
The default paths of distutil installation can be found in it’s API.
More information about copy_tree() module of distutils can be found here.
The problem with the glob
answer is that it only does so much. I.e. it’s not fully recursive. The problem with the copy_tree
answer is that the files that are copied will be left behind on an uninstall.
The proper solution is a recursive one which will let you set the package_data
parameter in the setup call.
I’ve written this small method to do this:
import os
def package_files(directory):
paths = []
for (path, directories, filenames) in os.walk(directory):
for filename in filenames:
paths.append(os.path.join('..', path, filename))
return paths
extra_files = package_files('path_to/extra_files_dir')
setup(
...
packages = ['package_name'],
package_data={'': extra_files},
....
)
You’ll notice that when you do a pip uninstall package_name
, that you’ll see your additional files being listed (as tracked with the package).
I can suggest a little code to add data_files in setup():
data_files = []
start_point = os.path.join(__pkgname__, 'static')
for root, dirs, files in os.walk(start_point):
root_files = [os.path.join(root, i) for i in files]
data_files.append((root, root_files))
start_point = os.path.join(__pkgname__, 'templates')
for root, dirs, files in os.walk(start_point):
root_files = [os.path.join(root, i) for i in files]
data_files.append((root, root_files))
setup(
name = __pkgname__,
description = __description__,
version = __version__,
long_description = README,
...
data_files = data_files,
)
Use glob to select all subfolders in your setup.py
:
...
packages=['your_package'],
package_data={'your_package': ['data/**/*']},
...
To add all the subfolders using package_data
in setup.py
:
add the number of *
entries based on you subdirectory structure
package_data={
'mypackage.data.folderA': ['*','*/*','*/*/*'],
}
I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?
Here is a reusable, simple way:
Add the following function in your setup.py
, and call it as per the Usage instructions. This is essentially the generic version of the accepted answer.
def find_package_data(specs):
"""recursively find package data as per the folders given
Usage:
# in setup.py
setup(...
include_package_data=True,
package_data=find_package_data({
'package': ('resources', 'static')
}))
Args:
specs (dict): package => list of folder names to include files from
Returns:
dict of list of file names
"""
return {
package: list(''.join(n.split('/', 1)[1:]) for n in
flatten(glob('{}/{}/**/*'.format(package, f), recursive=True) for f in folders))
for package, folders in specs.items()}
Update
According to the change log setuptools
now supports recursive globs, using **
, in package_data
(as of v62.3.0
, released May 2022).
Original answer
@gbonetti’s answer, using a recursive glob pattern, i.e. **
, would be perfect.
However, as commented by @daniel-himmelstein, that does not work yet in setuptools package_data
.
So, for the time being, I like to use the following workaround, based on pathlib
‘s Path.glob():
def glob_fix(package_name, glob):
# this assumes setup.py lives in the folder that contains the package
package_path = Path(f'./{package_name}').resolve()
return [str(path.relative_to(package_path))
for path in package_path.glob(glob)]
This returns a list of path strings relative to the package path, as required.
Here’s one way to use this:
setuptools.setup(
...
package_data={'my_package': [*glob_fix('my_package', 'my_data_dir/**/*'),
'my_other_dir/some.file', ...], ...},
...
)
The glob_fix()
can be removed as soon as setuptools supports **
in package_data
.
I’m going to throw my solution in here in case anyone is looking for a clean way to include their compiled sphinx docs as data_files
.
setup.py
from setuptools import setup
import pathlib
import os
here = pathlib.Path(__file__).parent.resolve()
# Get documentation files from the docs/build/html directory
documentation = [doc.relative_to(here) for doc in here.glob("docs/build/html/**/*") if pathlib.Path.is_file(doc)]
data_docs = {}
for doc in documentation:
doc_path = os.path.join("your_top_data_dir", "docs")
path_parts = doc.parts[3:-1] # remove "docs/build/html", ignore filename
if path_parts:
doc_path = os.path.join(doc_path, *path_parts)
# create all appropriate subfolders and append relative doc path
data_docs.setdefault(doc_path, []).append(str(doc))
setup(
...
include_package_data=True,
# <sys.prefix>/your_top_data_dir
data_files=[("your_top_data_dir", ["data/test-credentials.json"]), *list(data_docs.items())]
)
With the above solution, once you install your package you’ll have all the compiled documentation available at os.path.join(sys.prefix, "your_top_data_dir", "docs")
. So, if you wanted to serve the now-static docs using nginx you could add the following to your nginx file:
location /docs {
# handle static files directly, without forwarding to the application
alias /www/your_app_name/venv/your_top_data_dir/docs;
expires 30d;
}
Once you’ve done that, you should be able to visit {your-domain.com}/docs
and see your Sphinx documentation.
If you don’t want to add custom code to iterate through the directory contents, you can use pbr
library, which extends setuptools
. See here for documentation on how to use it to copy an entire directory, preserving the directory structure:
You need to write a function to return all files and its paths , you can use the following
def sherinfind():
# Add all folders contain files or other sub directories
pathlist=['templates/','scripts/']
data={}
for path in pathlist:
for root,d_names,f_names in os.walk(path,topdown=True, onerror=None, followlinks=False):
data[root]=list()
for f in f_names:
data[root].append(os.path.join(root, f))
fn=[(k,v) for k,v in data.items()]
return fn
Now change the data_files in setup() as follows,
data_files=sherinfind()
I have a new library that has to include a lot of subfolders of small datafiles, and I’m trying to add them as package data. Imagine I have my library as so:
library
- foo.py
- bar.py
data
subfolderA
subfolderA1
subfolderA2
subfolderB
subfolderB1
...
I want to add all of the data in all of the subfolders through setup.py
, but it seems like I manually have to go into every single subfolder (there are 100 or so) and add an __init__.py
file. Furthermore, will setup.py
find these files recursively, or do I need to manually add all of these in setup.py
like:
package_data={
'mypackage.data.folderA': ['*'],
'mypackage.data.folderA.subfolderA1': ['*'],
'mypackage.data.folderA.subfolderA2': ['*']
},
I can do this with a script, but seems like a super pain. How can I achieve this in setup.py
?
PS, the hierarchy of these folders is important because this is a database of material files and we want the file tree to be preserved when we present them in a GUI to the user, so it would be to our advantage to keep this file structure intact.
- Use Setuptools instead of distutils.
- Use data files instead of package data. These do not require
__init__.py
. -
Generate the lists of files and directories using standard Python code, instead of writing it literally:
data_files = [] directories = glob.glob('data/subfolder?/subfolder??/') for directory in directories: files = glob.glob(directory+'*') data_files.append((directory, files)) # then pass data_files to setup()
If you don’t have any problem with getting your setup.py code dirty use distutils.dir_util.copy_tree
.
The whole problem is how to exclude files from it.
Heres some the code:
import os.path
from distutils import dir_util
from distutils import sysconfig
from distutils.core import setup
__packagename__ = 'x'
setup(
name = __packagename__,
packages = [__packagename__],
)
destination_path = sysconfig.get_python_lib()
package_path = os.path.join(destination_path, __packagename__)
dir_util.copy_tree(__packagename__, package_path, update=1, preserve_mode=0)
Some Notes:
setup(...)
but use copy_tree()
to extend the directory you want into the path of installation.The problem with the glob
answer is that it only does so much. I.e. it’s not fully recursive. The problem with the copy_tree
answer is that the files that are copied will be left behind on an uninstall.
The proper solution is a recursive one which will let you set the package_data
parameter in the setup call.
I’ve written this small method to do this:
import os
def package_files(directory):
paths = []
for (path, directories, filenames) in os.walk(directory):
for filename in filenames:
paths.append(os.path.join('..', path, filename))
return paths
extra_files = package_files('path_to/extra_files_dir')
setup(
...
packages = ['package_name'],
package_data={'': extra_files},
....
)
You’ll notice that when you do a pip uninstall package_name
, that you’ll see your additional files being listed (as tracked with the package).
I can suggest a little code to add data_files in setup():
data_files = []
start_point = os.path.join(__pkgname__, 'static')
for root, dirs, files in os.walk(start_point):
root_files = [os.path.join(root, i) for i in files]
data_files.append((root, root_files))
start_point = os.path.join(__pkgname__, 'templates')
for root, dirs, files in os.walk(start_point):
root_files = [os.path.join(root, i) for i in files]
data_files.append((root, root_files))
setup(
name = __pkgname__,
description = __description__,
version = __version__,
long_description = README,
...
data_files = data_files,
)
Use glob to select all subfolders in your setup.py
:
...
packages=['your_package'],
package_data={'your_package': ['data/**/*']},
...
To add all the subfolders using package_data
in setup.py
:
add the number of *
entries based on you subdirectory structure
package_data={
'mypackage.data.folderA': ['*','*/*','*/*/*'],
}
I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?
Here is a reusable, simple way:
Add the following function in your setup.py
, and call it as per the Usage instructions. This is essentially the generic version of the accepted answer.
def find_package_data(specs):
"""recursively find package data as per the folders given
Usage:
# in setup.py
setup(...
include_package_data=True,
package_data=find_package_data({
'package': ('resources', 'static')
}))
Args:
specs (dict): package => list of folder names to include files from
Returns:
dict of list of file names
"""
return {
package: list(''.join(n.split('/', 1)[1:]) for n in
flatten(glob('{}/{}/**/*'.format(package, f), recursive=True) for f in folders))
for package, folders in specs.items()}
Update
According to the change log setuptools
now supports recursive globs, using **
, in package_data
(as of v62.3.0
, released May 2022).
Original answer
@gbonetti’s answer, using a recursive glob pattern, i.e. **
, would be perfect.
However, as commented by @daniel-himmelstein, that does not work yet in setuptools package_data
.
So, for the time being, I like to use the following workaround, based on pathlib
‘s Path.glob():
def glob_fix(package_name, glob):
# this assumes setup.py lives in the folder that contains the package
package_path = Path(f'./{package_name}').resolve()
return [str(path.relative_to(package_path))
for path in package_path.glob(glob)]
This returns a list of path strings relative to the package path, as required.
Here’s one way to use this:
setuptools.setup(
...
package_data={'my_package': [*glob_fix('my_package', 'my_data_dir/**/*'),
'my_other_dir/some.file', ...], ...},
...
)
The glob_fix()
can be removed as soon as setuptools supports **
in package_data
.
I’m going to throw my solution in here in case anyone is looking for a clean way to include their compiled sphinx docs as data_files
.
setup.py
from setuptools import setup
import pathlib
import os
here = pathlib.Path(__file__).parent.resolve()
# Get documentation files from the docs/build/html directory
documentation = [doc.relative_to(here) for doc in here.glob("docs/build/html/**/*") if pathlib.Path.is_file(doc)]
data_docs = {}
for doc in documentation:
doc_path = os.path.join("your_top_data_dir", "docs")
path_parts = doc.parts[3:-1] # remove "docs/build/html", ignore filename
if path_parts:
doc_path = os.path.join(doc_path, *path_parts)
# create all appropriate subfolders and append relative doc path
data_docs.setdefault(doc_path, []).append(str(doc))
setup(
...
include_package_data=True,
# <sys.prefix>/your_top_data_dir
data_files=[("your_top_data_dir", ["data/test-credentials.json"]), *list(data_docs.items())]
)
With the above solution, once you install your package you’ll have all the compiled documentation available at os.path.join(sys.prefix, "your_top_data_dir", "docs")
. So, if you wanted to serve the now-static docs using nginx you could add the following to your nginx file:
location /docs {
# handle static files directly, without forwarding to the application
alias /www/your_app_name/venv/your_top_data_dir/docs;
expires 30d;
}
Once you’ve done that, you should be able to visit {your-domain.com}/docs
and see your Sphinx documentation.
If you don’t want to add custom code to iterate through the directory contents, you can use pbr
library, which extends setuptools
. See here for documentation on how to use it to copy an entire directory, preserving the directory structure:
You need to write a function to return all files and its paths , you can use the following
def sherinfind():
# Add all folders contain files or other sub directories
pathlist=['templates/','scripts/']
data={}
for path in pathlist:
for root,d_names,f_names in os.walk(path,topdown=True, onerror=None, followlinks=False):
data[root]=list()
for f in f_names:
data[root].append(os.path.join(root, f))
fn=[(k,v) for k,v in data.items()]
return fn
Now change the data_files in setup() as follows,
data_files=sherinfind()