Best directory structure for a repository with several python entry points and internal dependencies?

Question

I’m working on a project with the following directory structure:

project/
    package1/
        module1.py
        module2.py
    package2/
        module1.py
        module2.py
    main1.py
    main2.py
    main3.py
    ...
    mainN.py

where each mainX.py file is an executable Python script that imports modules from either package1, package2, or both. package1 and package2 are subpackages meant to be distributed along with the rest of the project (not independently).

The standard thing to do is to put your entry point in the top-level directory. I have N entry points, so I put them all in the top-level directory. The trouble is that N keeps growing, so my top-level directory is getting flooded with entry points.

I could move the mainX.py files to a sub-directory (say, project/run), but then all of the package1 and package2 imports would break. I could extract package1 and package2 to a separate repository and just expect it to be installed on the system (i.e., in the system / user python path), but that would complicate installation. I could modify the Python path as a precondition or during runtime, but that’s messy and could introduce unintended consequences. I could write a single main.py entry point script with argument subparsers respectively pointing to run/main1.py, ..., run/mainN.py, but that would introduce coupling between main.py and each of the run/mainX.py files.

What’s the standard, "Pythonic" solution to this issue?

Asked By: Alexander Guyer

||

Source

Answer 1

A solution for you is to sort the entrypoints in an additional package but run them as modules and not directly by file.

project/
    package1/
        module1.py
        module2.py
    package2/
        module1.py
        module2.py
    run/
        main1.py
        main2.py
        main3.py
        ...
        mainN.py

python -m run.main3

This way your current directory (hopefully the project root) will still be the one prepended to sys.path instead of the directory containing the scripts.

More canonical solutions would include

configuring export PYTHONPATH=path/to/your/project
writing a path/to/your/project line in a foobar.pth file inside the site-packages folder of your virtualenv
using a single entrypoint that features subcommands, e.g. with https://click.palletsprojects.com/en/latest/api/#click.Group

Answered By: N1ngu

Answer 2

The standard solution is to use console_scripts packaging for your entry points – read about the entry-points specification here. This feature can be used to generate script wrappers like main1.py … mainN.py at installation time.

Since these script wrappers are generated code, they do not exist in the project source directory at all, so that problem of clutter ("top-level directory is getting flooded with entry points") goes away.

The actual code for the scripts will be defined somewhere within the package, and the places where the main*.py scripts will actually hook into code within the package is defined in the package metadata. You can hook a console script entry-point up to any callable within the package, provided it can be called without arguments (optional arguments, i.e. args with default values, are fine).

project
├── package1
│   ├── __init__.py
│   ├── module1.py
│   └── module2.py
├── package2
│   ├── __init__.py
│   ├── module1.py
│   └── module2.py
├── pyproject.toml
└── scripts
    └── __init__.py

This is the new directory structure. Note the addition of __init__.py files, which indicates that package1 and package2 are packages and not just subdirectories.

For the new files added, here’s the scripts/__init__.py:

# these imports should work
#   from package1 import ...
#   from package2.module1 import ...

def myscript1():
    # put whatever main1.py did here
    print("hello")

def myscript2():
    # put whatever main2.py did here
    print("world")

These don’t need to be all in the same file, and you can put them wherever you want within the package actually, as long as you update the hooks in the [project.scripts] section of the packaging definition.

And here’s that packaging definition:

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "mypackage"
version = "0.0.1"

[project.scripts]
"main1.py" = "scripts:myscript1"
"main2.py" = "scripts:myscript2"

[tool.setuptools]
packages = ["package1", "package2", "scripts"]

Now when the package is installed, the console scripts are generated:

$ pip install --editable .
...
Successfully installed mypackage-0.0.1
$ main1.py
hello
$ main2.py
world

As mentioned, those executables do not live in the project directory, but within the site’s scripts directory, which will be present on $PATH. The scripts are generated by pip, using vendored code from distlib’s ScriptMaker. If you peek at the generated script files you’ll see that they’re simple wrappers, they’ll just import the callable from within the package and then call it. Any argument parsing, logging configuration, etc must all still be handled within the package code.

$ ls
mypackage.egg-info  package1  package2  pyproject.toml  scripts
$ which main2.py
/tmp/project/.venv/bin/main2.py

The exact location of the scripts directory depends on your platform, but it can be checked like this in Python:

>>> import sysconfig
>>> sysconfig.get_path("scripts")
'/tmp/project/.venv/bin'

Answered By: wim

Best directory structure for a repository with several python entry points and internal dependencies?

Question:

Answers: