How to get a list of all the Python standard library modules?
Question:
I want something like sys.builtin_module_names
except for the standard library. Other things that didn’t work:
sys.modules
– only shows modules that have already been loaded
sys.prefix
– a path that would include non-standard library modules and doesn’t seem to work inside a virtualenv.
The reason I want this list is so that I can pass it to the --ignore-module
or --ignore-dir
command line options of trace
.
So ultimately, I want to know how to ignore all the standard library modules when using trace
or sys.settrace
.
Answers:
This will get you close:
import sys; import glob
glob.glob(sys.prefix + "/lib/python%d.%d" % (sys.version_info[0:2]) + "/*.py")
Another possibility for the ignore-dir
option:
os.pathsep.join(sys.path)
Why not work out what’s part of the standard library yourself?
import distutils.sysconfig as sysconfig
import os
std_lib = sysconfig.get_python_lib(standard_lib=True)
for top, dirs, files in os.walk(std_lib):
for nm in files:
if nm != '__init__.py' and nm[-3:] == '.py':
print os.path.join(top, nm)[len(std_lib)+1:-3].replace(os.sep, '.')
gives
abc
aifc
antigravity
--- a bunch of other files ----
xml.parsers.expat
xml.sax.expatreader
xml.sax.handler
xml.sax.saxutils
xml.sax.xmlreader
xml.sax._exceptions
Edit: You’ll probably want to add a check to avoid site-packages
if you need to avoid non-standard library modules.
Here’s an improvement on Caspar’s answer, which is not cross-platform, and misses out top-level modules (e.g. email
), dynamically loaded modules (e.g. array
), and core built-in modules (e.g. sys
):
import distutils.sysconfig as sysconfig
import os
import sys
std_lib = sysconfig.get_python_lib(standard_lib=True)
for top, dirs, files in os.walk(std_lib):
for nm in files:
prefix = top[len(std_lib)+1:]
if prefix[:13] == 'site-packages':
continue
if nm == '__init__.py':
print top[len(std_lib)+1:].replace(os.path.sep,'.')
elif nm[-3:] == '.py':
print os.path.join(prefix, nm)[:-3].replace(os.path.sep,'.')
elif nm[-3:] == '.so' and top[-11:] == 'lib-dynload':
print nm[0:-3]
for builtin in sys.builtin_module_names:
print builtin
This is still not perfect because it will miss things like os.path
which is defined from within os.py
in a platform-dependent manner via code such as import posixpath as path
, but it’s probably as good as you’ll get, bearing in mind that Python is a dynamic language and you can’t ever really know which modules are defined until they’re actually defined at runtime.
Python >= 3.10:
Python < 3.10:
The author of isort, a tool which cleans up imports, had to grapple this same problem in order to satisfy the pep8 requirement that core library imports should be ordered before third party imports.
I have been using this tool and it seems to be working well. You can use the method place_module
in the file isort.py
:
>>> from isort import place_module
>>> place_module("json")
'STDLIB'
>>> place_module("requests")
'THIRDPARTY'
Or you can get a set of module names directly, which is depending on Python version, for example:
>>> from isort.stdlibs.py39 import stdlib
>>> for name in sorted(stdlib): print(name)
... <200+ lines>
xml
xmlrpc
zipapp
zipfile
zipimport
zlib
zoneinfo
Take a look at this,
https://docs.python.org/3/py-modindex.html
They made an index page for the standard modules.
I brute forced it by writing some code to scrape the TOC of the Standard Library page in the official Python docs. I also built a simple API for getting a list of standard libraries (for Python version 2.6, 2.7, 3.2, 3.3, and 3.4).
The package is here, and its usage is fairly simple:
>>> from stdlib_list import stdlib_list
>>> libraries = stdlib_list("2.7")
>>> libraries[:10]
['AL', 'BaseHTTPServer', 'Bastion', 'CGIHTTPServer', 'ColorPicker', 'ConfigParser', 'Cookie', 'DEVICE', 'DocXMLRPCServer', 'EasyDialogs']
On Python 3.10 there is now sys.stdlib_module_names.
Building on @Edmund’s answer, this solution pulls the list from the official website:
def standard_libs(version=None, top_level_only=True):
import re
from urllib.request import urlopen
if version is None:
import sys
version = sys.version_info
version = f"{version.major}.{version.minor}"
url = f"https://docs.python.org/{version}/py-modindex.html"
with urlopen(url) as f:
page = f.read()
modules = set()
for module in re.findall(r'#module-(.*?)['"]',
page.decode('ascii', 'replace')):
if top_level_only:
module = module.split(".")[0]
modules.add(module)
return modules
It returns a set. For example, here are the modules that were added between 3.5 and 3.10:
>>> standard_libs("3.10") - standard_libs("3.5")
{'contextvars', 'dataclasses', 'graphlib', 'secrets', 'zoneinfo'}
Since this is based on the official documentation, it doesn’t include undocumented modules, such as:
- Easter eggs, namely
this
and antigravity
- Internal modules, such as
genericpath
, posixpath
or ntpath
, which are not supposed to be used directly (you should use os.path
instead). Other internal modules: idlelib
(which implements the IDLE editor), opcode
, sre_constants
, sre_compile
, sre_parse
, pyexpat
, pydoc_data
, nt
.
- All modules with a name starting with an underscore (which are also internal), except for
__main__', '_thread', and '__future__
which are public and documented.
If you’re concerned that the website may be down, you can just cache the list locally. For example, you can use the following function to create a small Python module containing all the module names:
def create_stdlib_module_names(
module_name="stdlib_module_names",
variable="stdlibs",
version=None,
top_level_only=True):
stdlibs = standard_libs(
version=version, top_level_only=top_level_only)
with open(f"{module_name}.py", "w") as f:
f.write(f"{variable} = {stdlibs!r}n")
Here’s how to use it:
>>> create_stdlib_module_names() # run this just once
>>> from stdlib_module_names import stdlibs
>>> len(stdlibs)
207
>>> "collections" in stdlibs
True
>>> "numpy" in stdlibs
False
This isn’t perfect, but should get you pretty close if you can’t run 3.10:
import os
import distutils.sysconfig
def get_stdlib_module_names():
stdlib_dir = distutils.sysconfig.get_python_lib(standard_lib=True)
return {f.replace(".py", "") for f in os.listdir(stdlib_dir)}
This misses some modules such as sys
, math
, time
, and itertools
.
My use case is logging which modules were imported during an app run, so having a rough filter for stdlib modules is fine. Also I return it as a set rather than a list so membership checks are faster.
This works on Anaconda on Windows, and I suspect it will work on Linux distros.
It goes to your Anaconda directory, e.g.:
C:Users{user}anaconda3Lib
, where standard libraries are installed. It then pulls folder names and filenames (dropping extensions).
import sys
import os
standard_libs = []
standard_lib_path = os.path.join(sys.prefix, "Lib")
for file in os.listdir(standard_lib_path):
standard_libs.append(file.split(".py")[0].strip().lower())
NB: Builtins, viewable via print(dir(__builtins__))
, are automatically loaded, whereas standard libs are not.
I want something like sys.builtin_module_names
except for the standard library. Other things that didn’t work:
sys.modules
– only shows modules that have already been loadedsys.prefix
– a path that would include non-standard library modules and doesn’t seem to work inside a virtualenv.
The reason I want this list is so that I can pass it to the --ignore-module
or --ignore-dir
command line options of trace
.
So ultimately, I want to know how to ignore all the standard library modules when using trace
or sys.settrace
.
This will get you close:
import sys; import glob
glob.glob(sys.prefix + "/lib/python%d.%d" % (sys.version_info[0:2]) + "/*.py")
Another possibility for the ignore-dir
option:
os.pathsep.join(sys.path)
Why not work out what’s part of the standard library yourself?
import distutils.sysconfig as sysconfig
import os
std_lib = sysconfig.get_python_lib(standard_lib=True)
for top, dirs, files in os.walk(std_lib):
for nm in files:
if nm != '__init__.py' and nm[-3:] == '.py':
print os.path.join(top, nm)[len(std_lib)+1:-3].replace(os.sep, '.')
gives
abc
aifc
antigravity
--- a bunch of other files ----
xml.parsers.expat
xml.sax.expatreader
xml.sax.handler
xml.sax.saxutils
xml.sax.xmlreader
xml.sax._exceptions
Edit: You’ll probably want to add a check to avoid site-packages
if you need to avoid non-standard library modules.
Here’s an improvement on Caspar’s answer, which is not cross-platform, and misses out top-level modules (e.g. email
), dynamically loaded modules (e.g. array
), and core built-in modules (e.g. sys
):
import distutils.sysconfig as sysconfig
import os
import sys
std_lib = sysconfig.get_python_lib(standard_lib=True)
for top, dirs, files in os.walk(std_lib):
for nm in files:
prefix = top[len(std_lib)+1:]
if prefix[:13] == 'site-packages':
continue
if nm == '__init__.py':
print top[len(std_lib)+1:].replace(os.path.sep,'.')
elif nm[-3:] == '.py':
print os.path.join(prefix, nm)[:-3].replace(os.path.sep,'.')
elif nm[-3:] == '.so' and top[-11:] == 'lib-dynload':
print nm[0:-3]
for builtin in sys.builtin_module_names:
print builtin
This is still not perfect because it will miss things like os.path
which is defined from within os.py
in a platform-dependent manner via code such as import posixpath as path
, but it’s probably as good as you’ll get, bearing in mind that Python is a dynamic language and you can’t ever really know which modules are defined until they’re actually defined at runtime.
Python >= 3.10:
Python < 3.10:
The author of isort, a tool which cleans up imports, had to grapple this same problem in order to satisfy the pep8 requirement that core library imports should be ordered before third party imports.
I have been using this tool and it seems to be working well. You can use the method place_module
in the file isort.py
:
>>> from isort import place_module
>>> place_module("json")
'STDLIB'
>>> place_module("requests")
'THIRDPARTY'
Or you can get a set of module names directly, which is depending on Python version, for example:
>>> from isort.stdlibs.py39 import stdlib
>>> for name in sorted(stdlib): print(name)
... <200+ lines>
xml
xmlrpc
zipapp
zipfile
zipimport
zlib
zoneinfo
Take a look at this,
https://docs.python.org/3/py-modindex.html
They made an index page for the standard modules.
I brute forced it by writing some code to scrape the TOC of the Standard Library page in the official Python docs. I also built a simple API for getting a list of standard libraries (for Python version 2.6, 2.7, 3.2, 3.3, and 3.4).
The package is here, and its usage is fairly simple:
>>> from stdlib_list import stdlib_list
>>> libraries = stdlib_list("2.7")
>>> libraries[:10]
['AL', 'BaseHTTPServer', 'Bastion', 'CGIHTTPServer', 'ColorPicker', 'ConfigParser', 'Cookie', 'DEVICE', 'DocXMLRPCServer', 'EasyDialogs']
On Python 3.10 there is now sys.stdlib_module_names.
Building on @Edmund’s answer, this solution pulls the list from the official website:
def standard_libs(version=None, top_level_only=True):
import re
from urllib.request import urlopen
if version is None:
import sys
version = sys.version_info
version = f"{version.major}.{version.minor}"
url = f"https://docs.python.org/{version}/py-modindex.html"
with urlopen(url) as f:
page = f.read()
modules = set()
for module in re.findall(r'#module-(.*?)['"]',
page.decode('ascii', 'replace')):
if top_level_only:
module = module.split(".")[0]
modules.add(module)
return modules
It returns a set. For example, here are the modules that were added between 3.5 and 3.10:
>>> standard_libs("3.10") - standard_libs("3.5")
{'contextvars', 'dataclasses', 'graphlib', 'secrets', 'zoneinfo'}
Since this is based on the official documentation, it doesn’t include undocumented modules, such as:
- Easter eggs, namely
this
andantigravity
- Internal modules, such as
genericpath
,posixpath
orntpath
, which are not supposed to be used directly (you should useos.path
instead). Other internal modules:idlelib
(which implements the IDLE editor),opcode
,sre_constants
,sre_compile
,sre_parse
,pyexpat
,pydoc_data
,nt
. - All modules with a name starting with an underscore (which are also internal), except for
__main__', '_thread', and '__future__
which are public and documented.
If you’re concerned that the website may be down, you can just cache the list locally. For example, you can use the following function to create a small Python module containing all the module names:
def create_stdlib_module_names(
module_name="stdlib_module_names",
variable="stdlibs",
version=None,
top_level_only=True):
stdlibs = standard_libs(
version=version, top_level_only=top_level_only)
with open(f"{module_name}.py", "w") as f:
f.write(f"{variable} = {stdlibs!r}n")
Here’s how to use it:
>>> create_stdlib_module_names() # run this just once
>>> from stdlib_module_names import stdlibs
>>> len(stdlibs)
207
>>> "collections" in stdlibs
True
>>> "numpy" in stdlibs
False
This isn’t perfect, but should get you pretty close if you can’t run 3.10:
import os
import distutils.sysconfig
def get_stdlib_module_names():
stdlib_dir = distutils.sysconfig.get_python_lib(standard_lib=True)
return {f.replace(".py", "") for f in os.listdir(stdlib_dir)}
This misses some modules such as sys
, math
, time
, and itertools
.
My use case is logging which modules were imported during an app run, so having a rough filter for stdlib modules is fine. Also I return it as a set rather than a list so membership checks are faster.
This works on Anaconda on Windows, and I suspect it will work on Linux distros.
It goes to your Anaconda directory, e.g.:
C:Users{user}anaconda3Lib
, where standard libraries are installed. It then pulls folder names and filenames (dropping extensions).
import sys
import os
standard_libs = []
standard_lib_path = os.path.join(sys.prefix, "Lib")
for file in os.listdir(standard_lib_path):
standard_libs.append(file.split(".py")[0].strip().lower())
NB: Builtins, viewable via print(dir(__builtins__))
, are automatically loaded, whereas standard libs are not.