Python3 (pip): find which package provides a particular module
Question:
Without getting confused, there are tons of questions about installing packages, how to import the resulting modules, and listing what packages are available. But there doesn’t seem to be the equivalent of a "–what-provides" option for pip, if you don’t have a requirements.txt or pipenv. This question is similar to a previous question, but asks for the parent package, and not additional metadata. That said, these other questions did not get a lot of attention or many accepted answers – eg. How do you find python package metadata information given a module. So forging ahead… .
By way of example, there are two packages (to name a few) that will install a module called "serial" – namely "pyserial" and "serial". So assuming that one of the packages was installed, we might find it by using pip list:
python3 -m pip list | grep serial
However, the problem comes in if the name of the package does not match the name of the module, or if you just want to find out what package to install, working on a legacy server or dev machine.
You can check the path of the imported module – which can give you a clue. But continuing the example…
>>> import serial
>>> print(serial.__file__)
/usr/lib/python3.6/site-packages/serial/__init__.py
It is in a "serial" directory, but only pyserial is in fact installed, not serial:
> python3 -m pip list | grep serial
pyserial 3.4
The closest I can come is to generate a requirements.txt via "pipreqs ./" which may fail on a dependent child file (as it does with me), or to reverse check dependencies via pipenv (which brings a whole set of new issues along to get it all setup):
> pipenv graph --reverse
cymysql==0.9.15
ftptool==0.7.1
netifaces==0.10.9
pip==20.2.2
PyQt5-sip==12.8.1
- PyQt5==5.15.0 [requires: PyQt5-sip>=12.8,<13]
setuptools==50.3.0
wheel==0.35.1
Does anyone know of a command that I have missed for a simple solution to finding what pip package provides a particular module?
Answers:
Use the packages_distributions()
function from importlib.metadata
(or importlib-metadata
). So for example, in your case where serial
is the name of the "import package":
import importlib.metadata # or: `import importlib_metadata`
importlib.metadata.packages_distributions()['serial']
This should return a list containing pyserial
, which is the name of the "distribution package" (the name that should be used to pip-install).
References
- https://importlib-metadata.readthedocs.io/en/stable/using.html#package-distributions
- https://github.com/python/importlib_metadata/pull/287/files
For older Python versions and/or older versions of importlib-metadata
…
I believe something like the following should work:
#!/usr/bin/env python3
import importlib.util
import pathlib
import importlib_metadata
def get_distribution(file_name):
result = None
for distribution in importlib_metadata.distributions():
try:
relative = (
pathlib.Path(file_name)
.relative_to(distribution.locate_file(''))
)
except ValueError:
pass
else:
if distribution.files and relative in distribution.files:
result = distribution
break
return result
def alpha():
file_name = importlib.util.find_spec('serial').origin
distribution = get_distribution(file_name)
print("alpha", distribution.metadata['Name'])
def bravo():
import serial
file_name = serial.__file__
distribution = get_distribution(file_name)
print("bravo", distribution.metadata['Name'])
if __name__ == '__main__':
alpha()
bravo()
This is just an example of code showing how to get the metadata of the installed project a specific module belongs to.
The important bit is the get_distribution
function, it takes a file name as an argument. It could be the file name of a module or package data. If that file name belongs to a project installed in the environment (via pip install
for example) then the importlib.metadata.Distribution
object is returned.
Edit 2023/01/31: This issue is now solved via the importlib_metadata library. See Provide mapping from "Python packages" to "distribution packages", specifically "Note 2" deals with this exact issue. As such, see comments by @sinoroc, you can locate the package (eg. package "pyserial" providing module "serial") with something like this:
>>> import importlib_metadata
>>> print(importlib_metadata.packages_distributions()['serial'])
['pyserial']
Building on @sinoroc’s much-published answer, I came up with the following code (incorporating the mentioned importlib.util.find_spec method, but with a bash-based search against the RECORD file in the path returned). I also tried to implement @sinoroc’s version – but was not successful. Both methods are included to demonstrate.
Run as "python3 python_find-module-package.py -m [module-name-here] -d", which will also print debug. Leave off the "-d" switch to get just the package name returned (and errors).
TLDR: Code on github.
#!/usr/bin/python3
import sys
import os.path
import importlib.util
import importlib_metadata
import pathlib
import subprocess
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-m", "--module", help="Find matching package for the specified Python module",
type=str)
#parser.add_argument("-u", "--username", help="Database username",
# type=str)
#parser.add_argument("-p", "--password", help="Database password",
# type=str)
parser.add_argument("-d", "--debug", help="Debug messages are enabled",
action="store_true")
args = parser.parse_args()
TESTMODULE='serial'
def debugPrint (message="Nothing"):
if args.debug:
print ("[DEBUG] %s" % str(message))
class application ():
def __init__(self, argsPassed):
self.argsPassed = argsPassed
debugPrint("Got these arguments:n%s" % (argsPassed))
def run (self):
#debugPrint("Running with args:n%s" % (self.argsPassed))
try:
if self.argsPassed.module is not None:
self.moduleName=self.argsPassed.module #i.e. the module that you're trying to match with a package.
else:
self.moduleName=TESTMODULE
print("[WARN] No module name supplied - defaulting to %s!" % (TESTMODULE))
self.location=importlib.util.find_spec(self.moduleName).origin
debugPrint(self.location)
except:
print("[ERROR] Parsing module name!")
exit(1)
try:
self.getPackage()
except Exception as e:
print ("[ERROR] getPackage failed: %s" % str(e))
try:
distResult=self.getDistribution(self.location)
self.packageStrDist=distResult.metadata['Name']
print(self.packageStrDist)
except Exception as e:
print ("[ERROR] getDistribution failed: %s" % str(e))
debugPrint("Parent package for "%s" is: "%s"" % (self.moduleName, self.packageStr))
return self.packageStr
def getPackage (self):
locationStr=self.location.split("site-packages/",1)[1]
debugPrint(locationStr)
#serial/__init__.py
locationDir=self.location.split(locationStr,1)[0]
debugPrint(locationDir)
#/usr/lib/python3.6/site-packages
cmd='find "' + locationDir + '" -type f -iname 'RECORD' -printf '"%p"\n' | xargs grep "' + locationStr + '" -l -Z'
debugPrint(cmd)
#find "/usr/lib/python3.6/site-packages" -type f -iname 'RECORD' -printf '"%p"n' | xargs grep "serial/__init__.py" -l -Z
#return_code = os.system(cmd)
#return_code = subprocess.run([cmd], stdout=subprocess.PIPE, universal_newlines=True, shell=False)
#findResultAll = return_code.stdout
findResultAll = subprocess.check_output(cmd, shell=True) # Returns stdout as byte array, null terminated.
findResult = str(findResultAll.decode('ascii').strip().strip('x00'))
debugPrint(findResult)
#/usr/lib/python3.6/site-packages/pyserial-3.4.dist-info/RECORD
findDir = os.path.split(findResult)
self.packageStr=findDir[0].replace(locationDir,"")
debugPrint(self.packageStr)
def getDistribution(self, fileName=TESTMODULE):
result = None
for distribution in importlib_metadata.distributions():
try:
relative = (pathlib.Path(fileName).relative_to(distribution.locate_file('')))
#except ValueError:
#except AttributeError:
except:
pass
else:
if relative in distribution.files:
result = distribution
return result
if __name__ == '__main__':
result=1
try:
prog = application(args)
result = prog.run()
except Exception as E:
print ("[ERROR] Prog Exception: %s" % str(E))
finally:
sys.exit(result)
# exit the program if we haven't already
print ("Shouldn't get here.")
sys.exit(result)
Without getting confused, there are tons of questions about installing packages, how to import the resulting modules, and listing what packages are available. But there doesn’t seem to be the equivalent of a "–what-provides" option for pip, if you don’t have a requirements.txt or pipenv. This question is similar to a previous question, but asks for the parent package, and not additional metadata. That said, these other questions did not get a lot of attention or many accepted answers – eg. How do you find python package metadata information given a module. So forging ahead… .
By way of example, there are two packages (to name a few) that will install a module called "serial" – namely "pyserial" and "serial". So assuming that one of the packages was installed, we might find it by using pip list:
python3 -m pip list | grep serial
However, the problem comes in if the name of the package does not match the name of the module, or if you just want to find out what package to install, working on a legacy server or dev machine.
You can check the path of the imported module – which can give you a clue. But continuing the example…
>>> import serial
>>> print(serial.__file__)
/usr/lib/python3.6/site-packages/serial/__init__.py
It is in a "serial" directory, but only pyserial is in fact installed, not serial:
> python3 -m pip list | grep serial
pyserial 3.4
The closest I can come is to generate a requirements.txt via "pipreqs ./" which may fail on a dependent child file (as it does with me), or to reverse check dependencies via pipenv (which brings a whole set of new issues along to get it all setup):
> pipenv graph --reverse
cymysql==0.9.15
ftptool==0.7.1
netifaces==0.10.9
pip==20.2.2
PyQt5-sip==12.8.1
- PyQt5==5.15.0 [requires: PyQt5-sip>=12.8,<13]
setuptools==50.3.0
wheel==0.35.1
Does anyone know of a command that I have missed for a simple solution to finding what pip package provides a particular module?
Use the packages_distributions()
function from importlib.metadata
(or importlib-metadata
). So for example, in your case where serial
is the name of the "import package":
import importlib.metadata # or: `import importlib_metadata`
importlib.metadata.packages_distributions()['serial']
This should return a list containing pyserial
, which is the name of the "distribution package" (the name that should be used to pip-install).
References
- https://importlib-metadata.readthedocs.io/en/stable/using.html#package-distributions
- https://github.com/python/importlib_metadata/pull/287/files
For older Python versions and/or older versions of importlib-metadata
…
I believe something like the following should work:
#!/usr/bin/env python3
import importlib.util
import pathlib
import importlib_metadata
def get_distribution(file_name):
result = None
for distribution in importlib_metadata.distributions():
try:
relative = (
pathlib.Path(file_name)
.relative_to(distribution.locate_file(''))
)
except ValueError:
pass
else:
if distribution.files and relative in distribution.files:
result = distribution
break
return result
def alpha():
file_name = importlib.util.find_spec('serial').origin
distribution = get_distribution(file_name)
print("alpha", distribution.metadata['Name'])
def bravo():
import serial
file_name = serial.__file__
distribution = get_distribution(file_name)
print("bravo", distribution.metadata['Name'])
if __name__ == '__main__':
alpha()
bravo()
This is just an example of code showing how to get the metadata of the installed project a specific module belongs to.
The important bit is the get_distribution
function, it takes a file name as an argument. It could be the file name of a module or package data. If that file name belongs to a project installed in the environment (via pip install
for example) then the importlib.metadata.Distribution
object is returned.
Edit 2023/01/31: This issue is now solved via the importlib_metadata library. See Provide mapping from "Python packages" to "distribution packages", specifically "Note 2" deals with this exact issue. As such, see comments by @sinoroc, you can locate the package (eg. package "pyserial" providing module "serial") with something like this:
>>> import importlib_metadata
>>> print(importlib_metadata.packages_distributions()['serial'])
['pyserial']
Building on @sinoroc’s much-published answer, I came up with the following code (incorporating the mentioned importlib.util.find_spec method, but with a bash-based search against the RECORD file in the path returned). I also tried to implement @sinoroc’s version – but was not successful. Both methods are included to demonstrate.
Run as "python3 python_find-module-package.py -m [module-name-here] -d", which will also print debug. Leave off the "-d" switch to get just the package name returned (and errors).
TLDR: Code on github.
#!/usr/bin/python3
import sys
import os.path
import importlib.util
import importlib_metadata
import pathlib
import subprocess
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-m", "--module", help="Find matching package for the specified Python module",
type=str)
#parser.add_argument("-u", "--username", help="Database username",
# type=str)
#parser.add_argument("-p", "--password", help="Database password",
# type=str)
parser.add_argument("-d", "--debug", help="Debug messages are enabled",
action="store_true")
args = parser.parse_args()
TESTMODULE='serial'
def debugPrint (message="Nothing"):
if args.debug:
print ("[DEBUG] %s" % str(message))
class application ():
def __init__(self, argsPassed):
self.argsPassed = argsPassed
debugPrint("Got these arguments:n%s" % (argsPassed))
def run (self):
#debugPrint("Running with args:n%s" % (self.argsPassed))
try:
if self.argsPassed.module is not None:
self.moduleName=self.argsPassed.module #i.e. the module that you're trying to match with a package.
else:
self.moduleName=TESTMODULE
print("[WARN] No module name supplied - defaulting to %s!" % (TESTMODULE))
self.location=importlib.util.find_spec(self.moduleName).origin
debugPrint(self.location)
except:
print("[ERROR] Parsing module name!")
exit(1)
try:
self.getPackage()
except Exception as e:
print ("[ERROR] getPackage failed: %s" % str(e))
try:
distResult=self.getDistribution(self.location)
self.packageStrDist=distResult.metadata['Name']
print(self.packageStrDist)
except Exception as e:
print ("[ERROR] getDistribution failed: %s" % str(e))
debugPrint("Parent package for "%s" is: "%s"" % (self.moduleName, self.packageStr))
return self.packageStr
def getPackage (self):
locationStr=self.location.split("site-packages/",1)[1]
debugPrint(locationStr)
#serial/__init__.py
locationDir=self.location.split(locationStr,1)[0]
debugPrint(locationDir)
#/usr/lib/python3.6/site-packages
cmd='find "' + locationDir + '" -type f -iname 'RECORD' -printf '"%p"\n' | xargs grep "' + locationStr + '" -l -Z'
debugPrint(cmd)
#find "/usr/lib/python3.6/site-packages" -type f -iname 'RECORD' -printf '"%p"n' | xargs grep "serial/__init__.py" -l -Z
#return_code = os.system(cmd)
#return_code = subprocess.run([cmd], stdout=subprocess.PIPE, universal_newlines=True, shell=False)
#findResultAll = return_code.stdout
findResultAll = subprocess.check_output(cmd, shell=True) # Returns stdout as byte array, null terminated.
findResult = str(findResultAll.decode('ascii').strip().strip('x00'))
debugPrint(findResult)
#/usr/lib/python3.6/site-packages/pyserial-3.4.dist-info/RECORD
findDir = os.path.split(findResult)
self.packageStr=findDir[0].replace(locationDir,"")
debugPrint(self.packageStr)
def getDistribution(self, fileName=TESTMODULE):
result = None
for distribution in importlib_metadata.distributions():
try:
relative = (pathlib.Path(fileName).relative_to(distribution.locate_file('')))
#except ValueError:
#except AttributeError:
except:
pass
else:
if relative in distribution.files:
result = distribution
return result
if __name__ == '__main__':
result=1
try:
prog = application(args)
result = prog.run()
except Exception as E:
print ("[ERROR] Prog Exception: %s" % str(E))
finally:
sys.exit(result)
# exit the program if we haven't already
print ("Shouldn't get here.")
sys.exit(result)