Python pip find out basic requirements from output of pip freeze
Question:
My friend just started learning Python and Flask, and is missing a lot of “best practices”, e.g., a requirements.txt
file.
He has recently asked me for assistance, and to make the project clean, I want to setup a CI service (Travis), but I need to work out this file first.
Since he did not initially have a requirements.txt
, all information I can have is his import
statements, as well as his output of pip freeze
.
As there’s no way to distinguish a direct requirement by the project and an indirect requirement by one of the packages, I want to find out all “top-level” packages from the list. A “top-level package” is a package that’s not required by another package in the list. For example, urllib3
is required by requests
, so when requests
is present, urllib3
may better not appear in the final result.
Is there a way to achieve this?
If anyone wants to help me with this specific instance, here’s the output of pip freeze
:
apturl==0.5.2
arrow==0.12.1
asn1crypto==0.24.0
binaryornot==0.4.4
blinker==1.4
Bootstrap-Flask==1.0.9
Brlapi==0.6.6
certifi==2018.1.18
chardet==3.0.4
Click==7.0
colorama==0.3.7
command-not-found==0.3
configparser==3.5.0
cookiecutter==1.6.0
cryptography==2.1.4
cupshelpers==1.0
decorator==4.1.2
defer==1.0.6
distro-info==0.18
dominate==2.3.5
Flask==1.0.2
Flask-Bootstrap4==4.0.2
Flask-Login==0.4.1
Flask-Mail==0.9.1
Flask-Moment==0.6.0
Flask-SQLAlchemy==2.3.2
Flask-WTF==0.14.2
future==0.17.1
httpie==0.9.8
httplib2==0.9.2
idna==2.6
ipython==5.5.0
ipython-genutils==0.2.0
itsdangerous==1.1.0
Jinja2==2.10
jinja2-time==0.2.0
keyring==10.6.0
keyrings.alt==3.0
language-selector==0.1
launchpadlib==1.10.6
lazr.restfulclient==0.13.5
lazr.uri==1.0.3
louis==3.5.0
macaroonbakery==1.1.3
Mako==1.0.7
MarkupSafe==1.1.0
mysqlclient==1.3.14
netifaces==0.10.4
oauth==1.0.1
olefile==0.45.1
pexpect==4.2.1
pickleshare==0.7.4
Pillow==5.1.0
poyo==0.4.2
prompt-toolkit==1.0.15
protobuf==3.0.0
pycairo==1.16.2
pycrypto==2.6.1
pycups==1.9.73
Pygments==2.2.0
pygobject==3.26.1
pymacaroons==0.13.0
PyNaCl==1.1.2
pyRFC3339==1.0
python-apt==1.6.3
python-dateutil==2.7.5
python-debian==0.1.32
pytz==2018.3
pyxdg==0.25
PyYAML==3.12
reportlab==3.4.0
requests==2.18.4
requests-unixsocket==0.1.5
ruamel.yaml==0.15.34
SecretStorage==2.3.1
simplegeneric==0.8.1
simplejson==3.13.2
six==1.11.0
SQLAlchemy==1.2.14
system-service==0.3
systemd-python==234
traitlets==4.3.2
ubuntu-drivers-common==0.0.0
ufw==0.35
unattended-upgrades==0.1
urllib3==1.22
usb-creator==0.3.3
visitor==0.1.3
wadllib==1.3.2
wcwidth==0.1.7
Werkzeug==0.14.1
whichcraft==0.5.2
WTForms==2.2.1
xkit==0.0.0
zope.interface==4.3.2
and here are the import
statements, with an additional pymysql
he told me.
import os
from flask import *
from flask_bootstrap import Bootstrap
from flask_moment import Moment
from flask_wtf import FlaskForm
from wtforms import *
from wtforms.validators import *
from flask_sqlalchemy import SQLAlchemy
from flask_mail import Mail, Message
from werkzeug.security import generate_password_hash,check_password_hash
from flask_login import login_required , login_user,login_fresh,login_url,LoginManager,UserMixin,logout_user
Answers:
First, I wanted to suggest using PIP‘s API, but it’s recommended to use pip as a CmdLine tool only ([PyPA]: Using pip from your program). Note that I successfully used it, I just don’t expose the code (at least for now).
Here’s a way that uses pkg_resources ([ReadTheDocs]: Package Discovery and Resource Access using pkg_resources).
code00.py:
#!/usr/bin/env python
import os
import pkg_resources
import sys
def get_pkgs(reqs_file="requirements_orig.txt"):
if reqs_file and os.path.isfile(reqs_file):
ret = dict()
with open(reqs_file) as f:
for item in f.readlines():
name, ver = item.strip("n").split("==")[:2]
ret[name] = ver, ()
return ret
else:
return {
item.project_name: (item.version, tuple([dep.name for dep in item.requires()])) for item in pkg_resources.working_set
}
def print_pkg_data(text, pkg_info):
print("{:s}nSize: {:d}nn{:s}".format(text, len(pkg_info), "n".join(["{:s}=={:s}".format(*item) for item in pkg_info])))
def main(*argv):
pkgs = get_pkgs(reqs_file=None)
full_pkg_info = [(name, data[0]) for name, data in sorted(pkgs.items())]
print_pkg_data("----------FULL LIST----------", full_pkg_info)
deps = set()
for name in pkgs:
deps = deps.union(pkgs[name][1])
min_pkg_info = [(name, data[0]) for name, data in sorted(pkgs.items()) if name not in deps]
print_pkg_data("n----------MINIMAL LIST----------", min_pkg_info)
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}n".format(" ".join(elem.strip() for elem in sys.version.split("n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("nDone.n")
sys.exit(rc)
Output:
(py_064_03.06.08_test0) e:WorkDevStackOverflowq054292236> "e:WorkDevVEnvspy_064_03.06.08_test0Scriptspython.exe" code00.py
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] 064bit on win32
----------FULL LIST----------
Size: 133
Babel==2.6.0
Click==7.0
Django==2.1.4
Flask==1.0.2
Jinja2==2.10
Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
Markdown==3.0.1
MarkupSafe==1.1.0
Pillow==5.3.0
PyQt5==5.9.2
PyQt5-sip==4.19.13
PyYAML==3.13
Pygments==2.3.1
QtAwesome==0.5.3
QtPy==1.5.2
Send2Trash==1.5.0
Sphinx==1.8.3
Werkzeug==0.14.1
absl-py==0.6.1
alabaster==0.7.12
asn1crypto==0.24.0
astor==0.7.1
astroid==2.1.0
backcall==0.1.0
bleach==3.0.2
certifi==2018.11.29
cffi==1.11.5
chardet==3.0.4
cloudpickle==0.6.1
colorama==0.4.1
cryptography==2.4.2
cycler==0.10.0
decorator==4.3.0
defusedxml==0.5.0
djangorestframework==3.9.0
docutils==0.14
entrypoints==0.2.3
fatiando==0.5
funcsigs==1.0.2
future==0.17.1
gast==0.2.0
grpcio==1.17.1
h5py==2.9.0
html5lib==1.0.1
idna==2.8
imagesize==1.1.0
ipaddr==2.2.0
ipykernel==5.1.0
ipython==7.2.0
ipython-genutils==0.2.0
ipywidgets==7.4.2
isort==4.3.4
itsdangerous==1.1.0
jedi==0.13.2
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
keyboard==0.13.2
keyring==17.1.1
kiwisolver==1.0.1
lazy-object-proxy==1.3.1
llvmlite==0.26.0
lxml==4.2.5
matplotlib==3.0.2
mccabe==0.6.1
mistune==0.8.4
nbconvert==5.4.0
nbformat==4.4.0
notebook==5.7.4
numba==0.41.0
numpy==1.15.4
numpydoc==0.8.0
opencv-python==3.4.4.19
packaging==18.0
pandas==0.23.4
pandocfilters==1.4.2
parso==0.3.1
patsy==0.5.1
pickleshare==0.7.5
pip==18.1
prometheus-client==0.5.0
prompt-toolkit==2.0.7
protobuf==3.6.1
psutil==5.4.8
pyOpenSSL==18.0.0
pycodestyle==2.4.0
pycparser==2.19
pycryptodome==3.7.2
pyflakes==2.0.0
pygame==1.9.4
pylint==2.2.2
pynput==1.4
pyparsing==2.3.0
python-dateutil==2.7.5
pytz==2018.7
pywin32==224
pywin32-ctypes==0.2.0
pywinpty==0.5.5
pyzmq==17.1.2
qtconsole==4.4.3
requests==2.21.0
rope==0.11.0
scapy==2.4.0
scipy==1.2.0
setuptools==40.6.3
sip==4.19.8
six==1.12.0
snowballstemmer==1.2.1
sphinxcontrib-websupport==1.1.0
spyder==3.3.2
spyder-kernels==0.3.0
statsmodels==0.9.0
tensorboard==1.12.1
tensorflow-gpu==1.12.0
tensorflow-tensorboard==1.5.1
termcolor==1.1.0
terminado==0.8.1
testpath==0.4.2
thrift==0.11.0
tornado==5.1.1
traitlets==4.3.2
typed-ast==1.1.1
urllib3==1.24.1
wcwidth==0.1.7
webencodings==0.5.1
wheel==0.32.3
widgetsnbextension==3.4.2
wrapt==1.10.11
xlrd==1.2.0
----------MINIMAL LIST----------
Size: 37
Babel==2.6.0
Click==7.0
Django==2.1.4
Flask==1.0.2
Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
Markdown==3.0.1
Pillow==5.3.0
PyQt5==5.9.2
PyQt5-sip==4.19.13
PyYAML==3.13
QtAwesome==0.5.3
QtPy==1.5.2
Sphinx==1.8.3
djangorestframework==3.9.0
fatiando==0.5
funcsigs==1.0.2
ipaddr==2.2.0
keyboard==0.13.2
lxml==4.2.5
opencv-python==3.4.4.19
pandas==0.23.4
patsy==0.5.1
pip==18.1
pyOpenSSL==18.0.0
pycryptodome==3.7.2
pygame==1.9.4
pynput==1.4
pywin32==224
scapy==2.4.0
spyder==3.3.2
statsmodels==0.9.0
tensorflow-gpu==1.12.0
tensorflow-tensorboard==1.5.1
thrift==0.11.0
xlrd==1.2.0
Notes:
-
(Stating the obvious): In order to get a pkg info, that pkg needs to be installed. That’s why in my example I didn’t used your file (I named it requirements_orig.txt), but the pkgs installed on my VEnv
-
As you can see, in my case the pkg number dropped from 133 to 37, which I’d say it’s pretty manageable (of course, more filtering can be done)
-
I created the data structures based on the assumption that a pkg name is a primary key (uniquely identifies a pkg). If this is false, the code would require a bit of change
Final note: If you also want to consider your module’s import list (to strip out even more pkgs, if possible), you could also try [Python.Docs]: modulefinder – Find modules used by a script (I used it in [SO]: What files are required for Py_Initialize to run? (@CristiFati’s answer), only from CmdLine, but it should be trivial to use it from a script)
My friend just started learning Python and Flask, and is missing a lot of “best practices”, e.g., a requirements.txt
file.
He has recently asked me for assistance, and to make the project clean, I want to setup a CI service (Travis), but I need to work out this file first.
Since he did not initially have a requirements.txt
, all information I can have is his import
statements, as well as his output of pip freeze
.
As there’s no way to distinguish a direct requirement by the project and an indirect requirement by one of the packages, I want to find out all “top-level” packages from the list. A “top-level package” is a package that’s not required by another package in the list. For example, urllib3
is required by requests
, so when requests
is present, urllib3
may better not appear in the final result.
Is there a way to achieve this?
If anyone wants to help me with this specific instance, here’s the output of pip freeze
:
apturl==0.5.2
arrow==0.12.1
asn1crypto==0.24.0
binaryornot==0.4.4
blinker==1.4
Bootstrap-Flask==1.0.9
Brlapi==0.6.6
certifi==2018.1.18
chardet==3.0.4
Click==7.0
colorama==0.3.7
command-not-found==0.3
configparser==3.5.0
cookiecutter==1.6.0
cryptography==2.1.4
cupshelpers==1.0
decorator==4.1.2
defer==1.0.6
distro-info==0.18
dominate==2.3.5
Flask==1.0.2
Flask-Bootstrap4==4.0.2
Flask-Login==0.4.1
Flask-Mail==0.9.1
Flask-Moment==0.6.0
Flask-SQLAlchemy==2.3.2
Flask-WTF==0.14.2
future==0.17.1
httpie==0.9.8
httplib2==0.9.2
idna==2.6
ipython==5.5.0
ipython-genutils==0.2.0
itsdangerous==1.1.0
Jinja2==2.10
jinja2-time==0.2.0
keyring==10.6.0
keyrings.alt==3.0
language-selector==0.1
launchpadlib==1.10.6
lazr.restfulclient==0.13.5
lazr.uri==1.0.3
louis==3.5.0
macaroonbakery==1.1.3
Mako==1.0.7
MarkupSafe==1.1.0
mysqlclient==1.3.14
netifaces==0.10.4
oauth==1.0.1
olefile==0.45.1
pexpect==4.2.1
pickleshare==0.7.4
Pillow==5.1.0
poyo==0.4.2
prompt-toolkit==1.0.15
protobuf==3.0.0
pycairo==1.16.2
pycrypto==2.6.1
pycups==1.9.73
Pygments==2.2.0
pygobject==3.26.1
pymacaroons==0.13.0
PyNaCl==1.1.2
pyRFC3339==1.0
python-apt==1.6.3
python-dateutil==2.7.5
python-debian==0.1.32
pytz==2018.3
pyxdg==0.25
PyYAML==3.12
reportlab==3.4.0
requests==2.18.4
requests-unixsocket==0.1.5
ruamel.yaml==0.15.34
SecretStorage==2.3.1
simplegeneric==0.8.1
simplejson==3.13.2
six==1.11.0
SQLAlchemy==1.2.14
system-service==0.3
systemd-python==234
traitlets==4.3.2
ubuntu-drivers-common==0.0.0
ufw==0.35
unattended-upgrades==0.1
urllib3==1.22
usb-creator==0.3.3
visitor==0.1.3
wadllib==1.3.2
wcwidth==0.1.7
Werkzeug==0.14.1
whichcraft==0.5.2
WTForms==2.2.1
xkit==0.0.0
zope.interface==4.3.2
and here are the import
statements, with an additional pymysql
he told me.
import os
from flask import *
from flask_bootstrap import Bootstrap
from flask_moment import Moment
from flask_wtf import FlaskForm
from wtforms import *
from wtforms.validators import *
from flask_sqlalchemy import SQLAlchemy
from flask_mail import Mail, Message
from werkzeug.security import generate_password_hash,check_password_hash
from flask_login import login_required , login_user,login_fresh,login_url,LoginManager,UserMixin,logout_user
First, I wanted to suggest using PIP‘s API, but it’s recommended to use pip as a CmdLine tool only ([PyPA]: Using pip from your program). Note that I successfully used it, I just don’t expose the code (at least for now).
Here’s a way that uses pkg_resources ([ReadTheDocs]: Package Discovery and Resource Access using pkg_resources).
code00.py:
#!/usr/bin/env python
import os
import pkg_resources
import sys
def get_pkgs(reqs_file="requirements_orig.txt"):
if reqs_file and os.path.isfile(reqs_file):
ret = dict()
with open(reqs_file) as f:
for item in f.readlines():
name, ver = item.strip("n").split("==")[:2]
ret[name] = ver, ()
return ret
else:
return {
item.project_name: (item.version, tuple([dep.name for dep in item.requires()])) for item in pkg_resources.working_set
}
def print_pkg_data(text, pkg_info):
print("{:s}nSize: {:d}nn{:s}".format(text, len(pkg_info), "n".join(["{:s}=={:s}".format(*item) for item in pkg_info])))
def main(*argv):
pkgs = get_pkgs(reqs_file=None)
full_pkg_info = [(name, data[0]) for name, data in sorted(pkgs.items())]
print_pkg_data("----------FULL LIST----------", full_pkg_info)
deps = set()
for name in pkgs:
deps = deps.union(pkgs[name][1])
min_pkg_info = [(name, data[0]) for name, data in sorted(pkgs.items()) if name not in deps]
print_pkg_data("n----------MINIMAL LIST----------", min_pkg_info)
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}n".format(" ".join(elem.strip() for elem in sys.version.split("n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("nDone.n")
sys.exit(rc)
Output:
(py_064_03.06.08_test0) e:WorkDevStackOverflowq054292236> "e:WorkDevVEnvspy_064_03.06.08_test0Scriptspython.exe" code00.py Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] 064bit on win32 ----------FULL LIST---------- Size: 133 Babel==2.6.0 Click==7.0 Django==2.1.4 Flask==1.0.2 Jinja2==2.10 Keras==2.2.4 Keras-Applications==1.0.6 Keras-Preprocessing==1.0.5 Markdown==3.0.1 MarkupSafe==1.1.0 Pillow==5.3.0 PyQt5==5.9.2 PyQt5-sip==4.19.13 PyYAML==3.13 Pygments==2.3.1 QtAwesome==0.5.3 QtPy==1.5.2 Send2Trash==1.5.0 Sphinx==1.8.3 Werkzeug==0.14.1 absl-py==0.6.1 alabaster==0.7.12 asn1crypto==0.24.0 astor==0.7.1 astroid==2.1.0 backcall==0.1.0 bleach==3.0.2 certifi==2018.11.29 cffi==1.11.5 chardet==3.0.4 cloudpickle==0.6.1 colorama==0.4.1 cryptography==2.4.2 cycler==0.10.0 decorator==4.3.0 defusedxml==0.5.0 djangorestframework==3.9.0 docutils==0.14 entrypoints==0.2.3 fatiando==0.5 funcsigs==1.0.2 future==0.17.1 gast==0.2.0 grpcio==1.17.1 h5py==2.9.0 html5lib==1.0.1 idna==2.8 imagesize==1.1.0 ipaddr==2.2.0 ipykernel==5.1.0 ipython==7.2.0 ipython-genutils==0.2.0 ipywidgets==7.4.2 isort==4.3.4 itsdangerous==1.1.0 jedi==0.13.2 jsonschema==2.6.0 jupyter==1.0.0 jupyter-client==5.2.4 jupyter-console==6.0.0 jupyter-core==4.4.0 keyboard==0.13.2 keyring==17.1.1 kiwisolver==1.0.1 lazy-object-proxy==1.3.1 llvmlite==0.26.0 lxml==4.2.5 matplotlib==3.0.2 mccabe==0.6.1 mistune==0.8.4 nbconvert==5.4.0 nbformat==4.4.0 notebook==5.7.4 numba==0.41.0 numpy==1.15.4 numpydoc==0.8.0 opencv-python==3.4.4.19 packaging==18.0 pandas==0.23.4 pandocfilters==1.4.2 parso==0.3.1 patsy==0.5.1 pickleshare==0.7.5 pip==18.1 prometheus-client==0.5.0 prompt-toolkit==2.0.7 protobuf==3.6.1 psutil==5.4.8 pyOpenSSL==18.0.0 pycodestyle==2.4.0 pycparser==2.19 pycryptodome==3.7.2 pyflakes==2.0.0 pygame==1.9.4 pylint==2.2.2 pynput==1.4 pyparsing==2.3.0 python-dateutil==2.7.5 pytz==2018.7 pywin32==224 pywin32-ctypes==0.2.0 pywinpty==0.5.5 pyzmq==17.1.2 qtconsole==4.4.3 requests==2.21.0 rope==0.11.0 scapy==2.4.0 scipy==1.2.0 setuptools==40.6.3 sip==4.19.8 six==1.12.0 snowballstemmer==1.2.1 sphinxcontrib-websupport==1.1.0 spyder==3.3.2 spyder-kernels==0.3.0 statsmodels==0.9.0 tensorboard==1.12.1 tensorflow-gpu==1.12.0 tensorflow-tensorboard==1.5.1 termcolor==1.1.0 terminado==0.8.1 testpath==0.4.2 thrift==0.11.0 tornado==5.1.1 traitlets==4.3.2 typed-ast==1.1.1 urllib3==1.24.1 wcwidth==0.1.7 webencodings==0.5.1 wheel==0.32.3 widgetsnbextension==3.4.2 wrapt==1.10.11 xlrd==1.2.0 ----------MINIMAL LIST---------- Size: 37 Babel==2.6.0 Click==7.0 Django==2.1.4 Flask==1.0.2 Keras==2.2.4 Keras-Applications==1.0.6 Keras-Preprocessing==1.0.5 Markdown==3.0.1 Pillow==5.3.0 PyQt5==5.9.2 PyQt5-sip==4.19.13 PyYAML==3.13 QtAwesome==0.5.3 QtPy==1.5.2 Sphinx==1.8.3 djangorestframework==3.9.0 fatiando==0.5 funcsigs==1.0.2 ipaddr==2.2.0 keyboard==0.13.2 lxml==4.2.5 opencv-python==3.4.4.19 pandas==0.23.4 patsy==0.5.1 pip==18.1 pyOpenSSL==18.0.0 pycryptodome==3.7.2 pygame==1.9.4 pynput==1.4 pywin32==224 scapy==2.4.0 spyder==3.3.2 statsmodels==0.9.0 tensorflow-gpu==1.12.0 tensorflow-tensorboard==1.5.1 thrift==0.11.0 xlrd==1.2.0
Notes:
-
(Stating the obvious): In order to get a pkg info, that pkg needs to be installed. That’s why in my example I didn’t used your file (I named it requirements_orig.txt), but the pkgs installed on my VEnv
-
As you can see, in my case the pkg number dropped from 133 to 37, which I’d say it’s pretty manageable (of course, more filtering can be done)
-
I created the data structures based on the assumption that a pkg name is a primary key (uniquely identifies a pkg). If this is false, the code would require a bit of change
Final note: If you also want to consider your module’s import list (to strip out even more pkgs, if possible), you could also try [Python.Docs]: modulefinder – Find modules used by a script (I used it in [SO]: What files are required for Py_Initialize to run? (@CristiFati’s answer), only from CmdLine, but it should be trivial to use it from a script)