Get module name programmatically with only PyPI package name

Question:

I want to programmatically install and import packages based on a list of package names. For most packages, this is no problem since the package and module names are the same.

However, the PyYAML package is one exception since its module is called simply yaml and there are probably more exceptions.

Here is the python function I use to install and import the packages/modules:

def install_and_import(package):
    import importlib
    try:
        importlib.import_module(package) #needs module name!
    except ImportError:
        import pip
        pip.main(['install', package]) #needs package name
    finally:
        globals()[package] = importlib.import_module(package)

Calling the function for each package in this list, ['backoff', 'pyyaml'] (parsed from requirements.txt), I get:

Collecting backoff
Installing collected packages: backoff
Successfully installed backoff-1.4.3
Collecting pyyaml
Installing collected packages: pyyaml
Successfully installed pyyaml-3.12
[...Trackback...]
ModuleNotFoundError: No module named 'pyyaml'

Is there a way, given just the package name (e.g., pyyaml), to find out the name of the module I actually need to import (e.g., yaml)?

Asked By: DarkerIvy

||

Answers:

Using distlib (pip install distlib) and a hacky “guess” at module names (this can be improved, but wanted to give you what I came up with before I have to get back to other things!)

import os.path
import sys

import distlib.database


def to_module(s):
    parts = os.path.splitext(s)[0].split(os.sep)
    if s.endswith('.py'):
        if parts[-1] == '__init__':
            parts.pop()
    elif s.endswith('.so'):
        parts[-1], _, _ = parts[-1].partition('.')
    return '.'.join(parts)


def main():
    dp = distlib.database.DistributionPath()
    dist = dp.get_distribution(sys.argv[1])
    for f, _, _ in dist.list_installed_files():
        if f.endswith(('.py', '.so')):
            print(to_module(f))


if __name__ == '__main__':
    exit(main())

to_module is pretty self explanatory, I use the DistributionPath() (a representation of “installed” modules) to query for a specific package that’s installed. From that I list the files and if they look like modules converts them into modules. Note that this won’t catch things like six (which add a six.moves module dynamically), but it’s a pretty good first-order approximation.

I’m also making assumptions about posix here, for other platforms you’ll want to adjust (such as windows which’ll use .pyd I believe).

Sample output:

$ python test.py pyyaml
_yaml
yaml
yaml.composer
yaml.constructor
yaml.cyaml
yaml.dumper
yaml.emitter
yaml.error
yaml.events
yaml.loader
yaml.nodes
yaml.parser
yaml.reader
yaml.representer
yaml.resolver
yaml.scanner
yaml.serializer
yaml.tokens
$ python test.py coverage
coverage.pickle2json
coverage.execfile
coverage.python
coverage.summary
coverage.html
coverage.plugin
coverage.pytracer
coverage.config
coverage.__main__
coverage.data
coverage.debug
coverage.annotate
coverage.backward
coverage.parser
coverage.misc
coverage.files
coverage.multiproc
coverage.backunittest
coverage.env
coverage
coverage.control
coverage.cmdline
coverage.results
coverage.version
coverage.plugin_support
coverage.templite
coverage.collector
coverage.xmlreport
coverage.report
coverage.phystokens
coverage.bytecode
coverage.tracer
coverage.fullcoverage.encodings
Answered By: Anthony Sottile

Based on Anthony Sottile’s excellent answer, I created a simplified version to give ONE module from the package. Most of the packages for my situation have one main module. (Of course, it would be sweet to handle more complex packages with multiple “main” modules.)

Testing on Windows, I’ve found some issues with .list_installed_files() (some of these are addressed in this “solution”):

  1. os.sep doesn’t properly split file names depending on which type of distribution. (Eggs go os.sep while wheels are posix direction.)
  2. For some distributions, you get full paths (seems like eggs). This leads to crazy module name guesses (e.g., ‘C:.Users.Username.AppData.RestOfPath.File’).

This searches for the first __init__.py to inform the module name. If it doesn’t find one, it just returns the package name (covers 90% of cases for me).

def package_to_module(package):
    dp = distlib.database.DistributionPath(include_egg=True)
    dist = dp.get_distribution(package)
    if dist is None:
        raise ModuleNotFoundError
    module = package # until we figure out something better
    for filename, _, _ in dist.list_installed_files():
        if filename.endswith(('.py')):
            parts = os.path.splitext(filename)[0].split(os.sep)
            if len(parts) == 1: # windows sep varies with distribution type
                parts = os.path.splitext(filename)[0].split('/')
            if parts[-1].startswith('_') and not parts[-1].startswith('__'):
                continue # ignore internals
            elif filename.endswith('.py') and parts[-1] == '__init__':
                module = parts[-2]
                break
    return module

Some examples:

>>> package_to_module("pyyaml")
'yaml'
>>> package_to_module("click")
'click'
>>> package_to_module("six")
'six'
>>> package_to_module("pip")
'pip'
>>> package_to_module("doesntexist")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in package_to_module
ModuleNotFoundError
Answered By: DarkerIvy

Q is from 2018 and since then there has been… —…okay, I am not event going there—, but there seems to be an easier option, i.e. using importlib_metadata.packages_distributions.

import importlib_metadata
from typing import List, Dict, Iterable
package2module: Dict[str, List[str] = importlib_metadata.packages_distributions()

# flip
import operator
_values: Iterable = map(operator.itemgetter(0), package2module.values())
module2package: Dict[str, str] = dict( zip(_values, package2module.keys() ))

This will give the names of the importable python modules to the installable pypi packages. Do note that importlib_metadata is not importlib.metadata. The latter is what you normally use to call importlib.metadata.version(package_name). Importlib has a few of these dualities.

Answered By: Matteo Ferla