How do I determine which requirements are actually needed in setup.py?

Question:

I’m cleaning up packaging for a python project I didn’t create. Currently, it does some explicitly unsupported magic to get its dependencies from a requirements.txt file. The file looks like it may have been generated by pip freeze; there are fixed versions for everything, and many apparently-extraneous packages listed. I am pretty sure some of these aren’t real dependencies, but I don’t know which ones.

Given just the source tree, how would I figure out, from scratch, what dependencies ought to be included in install_requires?

As a first stab, I’m grepping for non-stdlib import statements. I hope there’s a better way.

Asked By: Andrew

||

Answers:

I mean, the most effective way would honestly be to go through the code line by line and determine what packages may not be needed, what packages need updates, etc. I know Python 2 and 3 both have ModuleFinder which finds all the modules a script needs to successfully compile and run, but I’ve never used it before, so not sure how effective it is, especially for what you’re doing. However, if you’re interested, I’ll attach the link below.

https://docs.python.org/3/library/modulefinder.html

Answered By: HunBurry

There’s no way to do this perfectly, because Python is too flexible.

But it’s usually possible to do it well enough.

You can use start with the stdlib’s modulefinder.

Beyond that, a number of projects—mostly projects designed for building binary executables, installers, etc. for Python apps—have come up with heuristics that go even farther.

These usually work. And, when they fail, you usually immediately spot it on your first test. Even if they aren’t sufficient, they’re at the very least good sample code. Here are a few off the top of my head:


In case you’re wondering why it’s impossible:

Even forgetting about the program of dependencies in C extension modules, Python is just too flexible to catch all the ways you could import a module via static analysis.

Sure, you’d have to be dealing with code written by someone crazy enough to use explicitly unsupported magic for no good reason… but if you were, there’s nothing to stop someone from writing this instead of import lxml:1

with open('picture.jpg', encoding='cp500') as f:
    getattr(sys.modules[11], codecs.encode('vzcbeg_zbqhyr', 'rot13'))(f.read().strip())

In reality, things aren’t going to be that bad. But they could easily be too bad for rg import to be sufficient.

You could try to detect all the imports dynamically with a simple import hook, but that’s only guaranteed to work if you can exercise 100% of the code paths.


1. Of course this only works if importlib was the 12th module loaded, and if picture.jpg is not a JPEG image but a textfile whose contents are, in EBCDIC, lxmln

Answered By: abarnert

I’ve had great results with pipreqs that will automatically generate a requirements.txt file from your source code.

pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt
Answered By: Ereli

I wrote a tool, realreq, specifically for this issue.

You can install it using pip python3 -m pip install realreq. Using it is easy as:
realreq -s /path/to/your/source
It will then gather your dependencies actually used in your source code.

Answered By: calder-ty
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.