Using hyphen/dash in python repository name and package name

Question:

I am trying to make my git repository pip-installable. In preparation for that I am restructuring the repo to follow the right conventions. My understanding from looking at other repositories is that I should put all my source code in a package that has the same name as the repository name. E.g. if my repository is called myrepo, then the source code would all go into a package also called myrepo.

My repository has a hyphen in it for readability: e.g. my-repo. So if I wanted to make a package for it with the same name, it would have a hyphen in it as well. In this tutorial it says “don’t use hyphens” for python package names. However I’ve seen well-established packages such as scikit-learn that have hyphens in their name. One thing that I have noticed though is that in the scikit-learn repo, the package name is not the same as the repo name and is instead called sklearn.

I think my discussion above boils down to the following questions:

  1. When packaging a repo, what is the relationship between the repository’s name and the package’s name? Is there anything to beware of when having names that don’t match?
  2. Is it okay to have hyphens in package names? What about in repository names?
  3. If the package name for scikit-learn is sklearn, then how come when I install it I do pip install scikit-learn instead of pip install sklearn?
Asked By: DataMan

||

Answers:

To answer your 1st point let me rephrase my answer to a different question.

The biggest source of misunderstanding is that the word "package" is heavily overloaded. There are 4 different names in the game — the name of the repository, the name of the directory being used for development (the one that contains setup.py), the name of the directory containing __init__.py and other importable modules, the name of distribution at PyPI. Quite often these 4 are the same or similar but that’s not required.

The names of the repository and development directory can be any, their names don’t play any role. Of course it’s convenient to name them properly but that’s only convenience.

The name of the directory with Python files name the package to be imported. Once the package is named for import the name usually stuck and cannot be changed.

The name of the distribution gives one a page at PyPI and the name of distribution files (source distribution, eggs, wheels). It’s the name one puts in setup(name='distribution') call.

Let me show detailed real example. I’ve been maintaining a templating library called CheetahTemplate. I develop it in the development directory called cheetah3/. The distribution at PyPI is called Cheetah3; this is the name I put into setup(name='Cheetah3'). The top-level module is Cheetah hence one does import Cheetah.Template or from Cheetah import Template; that means that I have a directory cheetah3/Cheetah/.

The answer to 2 is: you can have dashes in repository names and PyPI distribution names but not in package (directories with __init__.py files) names and module (.py files) names because you cannot write in Python import xy-zzy, that would be subtraction and SyntaxError.

Point 3: The site and the repository names are scikit-learn, as well as the distribution name, but the importable package (the top-level directory with __init__.py) is sklearn.

PEP 8 has nothing to do with the question as it doesn’t talk about distribution, only about importable packages and modules.

Answered By: phd