Python pip find out basic requirements from output of pip freeze

Question:

My friend just started learning Python and Flask, and is missing a lot of “best practices”, e.g., a requirements.txt file.

He has recently asked me for assistance, and to make the project clean, I want to setup a CI service (Travis), but I need to work out this file first.

Since he did not initially have a requirements.txt, all information I can have is his import statements, as well as his output of pip freeze.

As there’s no way to distinguish a direct requirement by the project and an indirect requirement by one of the packages, I want to find out all “top-level” packages from the list. A “top-level package” is a package that’s not required by another package in the list. For example, urllib3 is required by requests, so when requests is present, urllib3 may better not appear in the final result.

Is there a way to achieve this?


If anyone wants to help me with this specific instance, here’s the output of pip freeze:

apturl==0.5.2
arrow==0.12.1
asn1crypto==0.24.0
binaryornot==0.4.4
blinker==1.4
Bootstrap-Flask==1.0.9
Brlapi==0.6.6
certifi==2018.1.18
chardet==3.0.4
Click==7.0
colorama==0.3.7
command-not-found==0.3
configparser==3.5.0
cookiecutter==1.6.0
cryptography==2.1.4
cupshelpers==1.0
decorator==4.1.2
defer==1.0.6
distro-info==0.18
dominate==2.3.5
Flask==1.0.2
Flask-Bootstrap4==4.0.2
Flask-Login==0.4.1
Flask-Mail==0.9.1
Flask-Moment==0.6.0
Flask-SQLAlchemy==2.3.2
Flask-WTF==0.14.2
future==0.17.1
httpie==0.9.8
httplib2==0.9.2
idna==2.6
ipython==5.5.0
ipython-genutils==0.2.0
itsdangerous==1.1.0
Jinja2==2.10
jinja2-time==0.2.0
keyring==10.6.0
keyrings.alt==3.0
language-selector==0.1
launchpadlib==1.10.6
lazr.restfulclient==0.13.5
lazr.uri==1.0.3
louis==3.5.0
macaroonbakery==1.1.3
Mako==1.0.7
MarkupSafe==1.1.0
mysqlclient==1.3.14
netifaces==0.10.4
oauth==1.0.1
olefile==0.45.1
pexpect==4.2.1
pickleshare==0.7.4
Pillow==5.1.0
poyo==0.4.2
prompt-toolkit==1.0.15
protobuf==3.0.0
pycairo==1.16.2
pycrypto==2.6.1
pycups==1.9.73
Pygments==2.2.0
pygobject==3.26.1
pymacaroons==0.13.0
PyNaCl==1.1.2
pyRFC3339==1.0
python-apt==1.6.3
python-dateutil==2.7.5
python-debian==0.1.32
pytz==2018.3
pyxdg==0.25
PyYAML==3.12
reportlab==3.4.0
requests==2.18.4
requests-unixsocket==0.1.5
ruamel.yaml==0.15.34
SecretStorage==2.3.1
simplegeneric==0.8.1
simplejson==3.13.2
six==1.11.0
SQLAlchemy==1.2.14
system-service==0.3
systemd-python==234
traitlets==4.3.2
ubuntu-drivers-common==0.0.0
ufw==0.35
unattended-upgrades==0.1
urllib3==1.22
usb-creator==0.3.3
visitor==0.1.3
wadllib==1.3.2
wcwidth==0.1.7
Werkzeug==0.14.1
whichcraft==0.5.2
WTForms==2.2.1
xkit==0.0.0
zope.interface==4.3.2

and here are the import statements, with an additional pymysql he told me.

import os
from flask import *
from flask_bootstrap import Bootstrap
from flask_moment import Moment
from flask_wtf import FlaskForm
from wtforms import *
from wtforms.validators import *
from flask_sqlalchemy import SQLAlchemy
from flask_mail import Mail, Message
from werkzeug.security import generate_password_hash,check_password_hash
from flask_login import login_required , login_user,login_fresh,login_url,LoginManager,UserMixin,logout_user
Asked By: iBug

||

Answers:

First, I wanted to suggest using PIP‘s API, but it’s recommended to use pip as a CmdLine tool only ([PyPA]: Using pip from your program). Note that I successfully used it, I just don’t expose the code (at least for now).
Here’s a way that uses pkg_resources ([ReadTheDocs]: Package Discovery and Resource Access using pkg_resources).

code00.py:

#!/usr/bin/env python

import os
import pkg_resources
import sys


def get_pkgs(reqs_file="requirements_orig.txt"):
    if reqs_file and os.path.isfile(reqs_file):
        ret = dict()
        with open(reqs_file) as f:
            for item in f.readlines():
                name, ver = item.strip("n").split("==")[:2]
                ret[name] = ver, ()
        return ret
    else:
        return {
            item.project_name: (item.version, tuple([dep.name for dep in item.requires()])) for item in pkg_resources.working_set
        }


def print_pkg_data(text, pkg_info):
    print("{:s}nSize: {:d}nn{:s}".format(text, len(pkg_info), "n".join(["{:s}=={:s}".format(*item) for item in pkg_info])))


def main(*argv):
    pkgs = get_pkgs(reqs_file=None)
    full_pkg_info = [(name, data[0]) for name, data in sorted(pkgs.items())]
    print_pkg_data("----------FULL LIST----------", full_pkg_info)

    deps = set()
    for name in pkgs:
        deps = deps.union(pkgs[name][1])
    min_pkg_info = [(name, data[0]) for name, data in sorted(pkgs.items()) if name not in deps]
    print_pkg_data("n----------MINIMAL LIST----------", min_pkg_info)


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}n".format(" ".join(elem.strip() for elem in sys.version.split("n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("nDone.n")
    sys.exit(rc)

Output:

(py_064_03.06.08_test0) e:WorkDevStackOverflowq054292236> "e:WorkDevVEnvspy_064_03.06.08_test0Scriptspython.exe" code00.py
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] 064bit on win32

----------FULL LIST----------
Size: 133

Babel==2.6.0
Click==7.0
Django==2.1.4
Flask==1.0.2
Jinja2==2.10
Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
Markdown==3.0.1
MarkupSafe==1.1.0
Pillow==5.3.0
PyQt5==5.9.2
PyQt5-sip==4.19.13
PyYAML==3.13
Pygments==2.3.1
QtAwesome==0.5.3
QtPy==1.5.2
Send2Trash==1.5.0
Sphinx==1.8.3
Werkzeug==0.14.1
absl-py==0.6.1
alabaster==0.7.12
asn1crypto==0.24.0
astor==0.7.1
astroid==2.1.0
backcall==0.1.0
bleach==3.0.2
certifi==2018.11.29
cffi==1.11.5
chardet==3.0.4
cloudpickle==0.6.1
colorama==0.4.1
cryptography==2.4.2
cycler==0.10.0
decorator==4.3.0
defusedxml==0.5.0
djangorestframework==3.9.0
docutils==0.14
entrypoints==0.2.3
fatiando==0.5
funcsigs==1.0.2
future==0.17.1
gast==0.2.0
grpcio==1.17.1
h5py==2.9.0
html5lib==1.0.1
idna==2.8
imagesize==1.1.0
ipaddr==2.2.0
ipykernel==5.1.0
ipython==7.2.0
ipython-genutils==0.2.0
ipywidgets==7.4.2
isort==4.3.4
itsdangerous==1.1.0
jedi==0.13.2
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
keyboard==0.13.2
keyring==17.1.1
kiwisolver==1.0.1
lazy-object-proxy==1.3.1
llvmlite==0.26.0
lxml==4.2.5
matplotlib==3.0.2
mccabe==0.6.1
mistune==0.8.4
nbconvert==5.4.0
nbformat==4.4.0
notebook==5.7.4
numba==0.41.0
numpy==1.15.4
numpydoc==0.8.0
opencv-python==3.4.4.19
packaging==18.0
pandas==0.23.4
pandocfilters==1.4.2
parso==0.3.1
patsy==0.5.1
pickleshare==0.7.5
pip==18.1
prometheus-client==0.5.0
prompt-toolkit==2.0.7
protobuf==3.6.1
psutil==5.4.8
pyOpenSSL==18.0.0
pycodestyle==2.4.0
pycparser==2.19
pycryptodome==3.7.2
pyflakes==2.0.0
pygame==1.9.4
pylint==2.2.2
pynput==1.4
pyparsing==2.3.0
python-dateutil==2.7.5
pytz==2018.7
pywin32==224
pywin32-ctypes==0.2.0
pywinpty==0.5.5
pyzmq==17.1.2
qtconsole==4.4.3
requests==2.21.0
rope==0.11.0
scapy==2.4.0
scipy==1.2.0
setuptools==40.6.3
sip==4.19.8
six==1.12.0
snowballstemmer==1.2.1
sphinxcontrib-websupport==1.1.0
spyder==3.3.2
spyder-kernels==0.3.0
statsmodels==0.9.0
tensorboard==1.12.1
tensorflow-gpu==1.12.0
tensorflow-tensorboard==1.5.1
termcolor==1.1.0
terminado==0.8.1
testpath==0.4.2
thrift==0.11.0
tornado==5.1.1
traitlets==4.3.2
typed-ast==1.1.1
urllib3==1.24.1
wcwidth==0.1.7
webencodings==0.5.1
wheel==0.32.3
widgetsnbextension==3.4.2
wrapt==1.10.11
xlrd==1.2.0

----------MINIMAL LIST----------
Size: 37

Babel==2.6.0
Click==7.0
Django==2.1.4
Flask==1.0.2
Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
Markdown==3.0.1
Pillow==5.3.0
PyQt5==5.9.2
PyQt5-sip==4.19.13
PyYAML==3.13
QtAwesome==0.5.3
QtPy==1.5.2
Sphinx==1.8.3
djangorestframework==3.9.0
fatiando==0.5
funcsigs==1.0.2
ipaddr==2.2.0
keyboard==0.13.2
lxml==4.2.5
opencv-python==3.4.4.19
pandas==0.23.4
patsy==0.5.1
pip==18.1
pyOpenSSL==18.0.0
pycryptodome==3.7.2
pygame==1.9.4
pynput==1.4
pywin32==224
scapy==2.4.0
spyder==3.3.2
statsmodels==0.9.0
tensorflow-gpu==1.12.0
tensorflow-tensorboard==1.5.1
thrift==0.11.0
xlrd==1.2.0

Notes:

  • (Stating the obvious): In order to get a pkg info, that pkg needs to be installed. That’s why in my example I didn’t used your file (I named it requirements_orig.txt), but the pkgs installed on my VEnv

  • As you can see, in my case the pkg number dropped from 133 to 37, which I’d say it’s pretty manageable (of course, more filtering can be done)

  • I created the data structures based on the assumption that a pkg name is a primary key (uniquely identifies a pkg). If this is false, the code would require a bit of change

Final note: If you also want to consider your module’s import list (to strip out even more pkgs, if possible), you could also try [Python.Docs]: modulefinder – Find modules used by a script (I used it in [SO]: What files are required for Py_Initialize to run? (@CristiFati’s answer), only from CmdLine, but it should be trivial to use it from a script)

Answered By: CristiFati
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.