Preventing namespace collisions between private and pypi-based Python packages

Question:

We have 100+ private packages and so far we’ve been using s3pypi to set up a private pypi in an s3 bucket. Our private packages have dependencies on each other (and on public packages), and it is (of course) important that our GitLab pipelines find the latest functional version of packages it relies on. I.e. we’re not interested in the latest checked in code. We create new wheels only after tests and qa has run against a push to master (which is a long-winded way of explaining that -e <vcs> requirements will not work).

Our setup works really well until someone creates a new public package on the official pypi that shadows one of our package names. We can force our private package to be chosen by increasing the version number so it is higher than the new package on pypi.org – or by renaming our package to something that haven’t yet been taken on pypi.org.

This is obviously a hacky and fragile solution, but apparently the functionality is this way by-design.

After the initial bucket setup s3pypi has required no maintenance or administration. The above ticket suggests using devpi but that seems like a very heavy solution that requires administration/monitoring/etc.

GitLab’s pypi solution seems to be at individual package level (meaning we’d have to list up to 100+ urls – one for each package). This doesn’t seem practical, but maybe I’m not understanding something (I can see the package registry menu under our group as well, but the docs point to the "package-pypi" docs).

We can’t be the first small company that has faced this issue..? Is there a better way than to register dummy versions of all our packages on pypi.org (with version=0.0.1, so the s3pypi version will be preferred)?

Asked By: thebjorn

||

Answers:

It might not be the solution for you, but I tell what we do.

  1. Prefix the package names, and using namespaces (eg. company.product.tool).
  2. When we install our packages (including their in-house dependencies), we use a requirements.txt file including our PyPI URL. We run everything in container(s) and we install all public dependencies in them when we are building the images.
Answered By: Balázs

Your company could redirect all requests to pypi to a service you control first (perhaps just at your build servers’ hosts file(s))

This would potentially allow you to

  • prefer/override arbitrary packages with local ones
  • detect such cases
  • cache common/large upstream packages locally
  • reject suspect/non-known versions/names of upstream packages
Answered By: ti7

We use VCS for this. I see you’ve explicitly ruled that out, but have you considered using branches to mark your latest stable builds in VCS?

If you aren’t interested in the latest version of master or the dev branch, but you are running test/QA against commits, then I would configure your test/QA suite to merge into a branch named something like "stable" or "pypi-stable" and then your requirements files look like this:

pip install git+https://gitlab.com/yourorg/yourpackage.git@pypi-stable

The same configuration will work for setup.py requirements blocks (which allows for chained internal dependencies).

Am I missing something?

Answered By: kerasbaz

You could perhaps get the behavior you are looking for from a requirements.txt and two pip calls:

cat requirements.txt | xargs -n 1 pip install -i <your-s3pipy>
pip install -r requirements.txt

The first one tries to install what it can from your local repository and ignores a package if it fails. The second call tries to install everything that failed before from pipy.

This works because --upgrade-strategy only-if-needed is the default (as of pip 10.X I believe, don’t quote me on that). If you are using an old pip you may have to specify this manually.


A limitation of this approach is if you expect/request a local package, but it doesn’t exist and a package with the same name exists on pipy. In this case, you will get that package instead. Not sure if that is a concern.

Answered By: FirefoxMetzger

The comment from @a_guest on my first answer got me thinking, and the "problem" is that pip doesn’t consider where the package originated when it sorts through candidates to satisfy requirements.

So here is a possible way to change this: Monkey-patch pip and introduce a preference over indexes.

from __future__ import absolute_import
import os
import sys

import pip
from pip._internal.index.package_finder import CandidateEvaluator


class MyCandidateEvaluator(CandidateEvaluator):
    def _sort_key(self, candidate):
        (has_allowed_hash, yank_value, binary_preference, candidate.version,
         build_tag, pri) = super()._sort_key(candidate)

        priority_index = "localhost"  #use your s3pipy here
        if priority_index in candidate.link.comes_from:
            priority = 1
        else:
            priority = 0

        return (has_allowed_hash, yank_value, binary_preference, priority,
                candidate.version, build_tag, pri)


pip._internal.index.package_finder.CandidateEvaluator = MyCandidateEvaluator

# Remove '' and current working directory from the first entry
# of sys.path, if present to avoid using current directory
# in pip commands check, freeze, install, list and show,
# when invoked as python -m pip <command>
if sys.path[0] in ('', os.getcwd()):
    sys.path.pop(0)

# If we are running from a wheel, add the wheel to sys.path
# This allows the usage python pip-*.whl/pip install pip-*.whl
if __package__ == '':
    # __file__ is pip-*.whl/pip/__main__.py
    # first dirname call strips of '/__main__.py', second strips off '/pip'
    # Resulting path is the name of the wheel itself
    # Add that to sys.path so we can import pip
    path = os.path.dirname(os.path.dirname(__file__))
    sys.path.insert(0, path)

from pip._internal.cli.main import main as _main  # isort:skip # noqa


if __name__ == '__main__':
    sys.exit(_main())

setup a requirements.txt

numpy
sampleproject

and call above script using the same parameters as you’d use for pip.

>python mypip.py install --no-cache --extra-index http://localhost:8000 -r requirements.txt
Looking in indexes: https://pypi.org/simple, http://localhost:8000
Collecting numpy
  Downloading numpy-1.19.1-cp37-cp37m-win_amd64.whl (12.9 MB)
     |████████████████████████████████| 12.9 MB 6.8 MB/s
Collecting sampleproject
  Downloading http://localhost:8000/sampleproject/sampleproject-0.5.0-py2.py3-none-any.whl (4.3 kB)
Collecting peppercorn
  Downloading peppercorn-0.6-py3-none-any.whl (4.8 kB)
Installing collected packages: numpy, peppercorn, sampleproject
Successfully installed numpy-1.19.1 peppercorn-0.6 sampleproject-0.5.0

Compare this to the default pip call

>pip install --no-cache --extra-index http://localhost:8000 -r requirements.txt
Looking in indexes: https://pypi.org/simple, http://localhost:8000
Collecting numpy
  Downloading numpy-1.19.1-cp37-cp37m-win_amd64.whl (12.9 MB)
     |████████████████████████████████| 12.9 MB 6.4 MB/s
Collecting sampleproject
  Downloading sampleproject-2.0.0-py3-none-any.whl (4.2 kB)
Collecting peppercorn
  Downloading peppercorn-0.6-py3-none-any.whl (4.8 kB)
Installing collected packages: numpy, peppercorn, sampleproject
Successfully installed numpy-1.19.1 peppercorn-0.6 sampleproject-2.0.0

And notice that mypip prefers a package if it can be retrieved from localhost; ofc you can customize this behavior further.

Answered By: FirefoxMetzger
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.