python paths and import order

Question:

I really want to get this right because I keep running into it when generating some big py2app/py2exe packages. I have my package that contains a lot of modules/packages that might also be in the users site packages/default location (if a user has a python distribution) but I want my distributed packages to take effect before them when running from my distribution.

Now from what I’ve read here PYTHONPATH should be the first thing added to sys.path after the current directory, however from what I’ve tested on my machine that is not the case and all the folders defined in $site-packages$/easy-install.pth take precedence over this.

Could someone please give me some more in-depth explanation about this import order and help me find a way to set the environmental variables in such a way that the packages I distribute take precedence over the default installed ones?

So far my attempt is, for example on Mac-OS py2app, in my entry point script:

 os.environ['PYTHONPATH'] = DATA_PATH + ':'
 os.environ['PYTHONPATH'] = os.environ['PYTHONPATH'] + os.path.join(DATA_PATH
                                                            , 'lib') + ':'
 os.environ['PYTHONPATH'] = os.environ['PYTHONPATH'] + os.path.join(
                                DATA_PATH, 'lib', 'python2.7', 'site-packages') + ':'
 os.environ['PYTHONPATH'] = os.environ['PYTHONPATH'] + os.path.join(
                          DATA_PATH, 'lib', 'python2.7', 'site-packages.zip')

This is basically the structure of the package generated by py2app. Then I just:

 SERVER = subprocess.Popen([PYTHON_EXE_PATH, '-m', 'bin.rpserver'
                            , cfg.RPC_SERVER_IP, cfg.RPC_SERVER_PORT],
                            shell=False, stdin=IN_FILE, stdout=OUT_FILE, 
                            stderr=ERR_FILE)

Here PYTHON_EXE_PATH is the path to the python executable that is added by py2app to the package. This works fine on a machine that doesn’t have a python installed. However, when python distribution is already present, its site-packages take precedence.

Asked By: Bogdan

||

Answers:

Python searches the paths in sys.path in order (see http://docs.python.org/tutorial/modules.html#the-module-search-path). easy_install changes this list directly (see the last line in your easy-install.pth file):

import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:]; p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; sys.__egginsert = p+len(new)

This basically takes whatever directories are added and inserts them at the beginning of the list.

Also see Eggs in path before PYTHONPATH environment variable.

Answered By: cwa

This page is a high Google result for "Python import order", so here’s a hopefully clearer explanation:

As both of those pages explain, the import order is:

  1. Built-in python modules. You can see the list in the variable sys.modules.
  2. The sys.path entries.
  3. The installation-dependent default locations.

And as the sys.path doc page explains, it is populated as follows:

  1. The first entry is the FULL PATH TO THE DIRECTORY of the file which python was started with (so /someplace/on/disk/> $ python /path/to/the/run.py means the first path is /path/to/the/, and likewise the path would be the same if you’re in /path/to/> $ python the/run.py (it is still ALWAYS going to be set to the FULL PATH to the directory no matter if you gave python a relative or absolute file)), or it will be an empty string if python was started without a file aka interactive mode (an empty string means "current working directory for the python process"). In other words, Python assumes that the file you started wants to be able to do relative imports of package/-folders and blah.py modules that exist within the same location as the file you started python with.
  2. The other entries in sys.path are populated from the PYTHONPATH environment variable. Basically your global pip folders where your third-party python packages are installed (things like requests and numpy and tensorflow).

So, basically: Yes, you can trust that Python will find your local package-folders and module files first, before any globally installed pip stuff.

Here’s an example to explain further:

myproject/ # <-- This is not a package (no __init__.py file).
  modules/ # <-- This is a package (has an __init__.py file).
    __init__.py
    foo.py
  run.py
  second.py

executed with: python /path/to/the/myproject/run.py
will cause sys.path[0] to be "/path/to/the/myproject/"

run.py contents:
import modules.foo as foo # will import "/path/to/the/myproject/" + "modules/foo.py"
import second # will import "/path/to/the/myproject/" + "second.py"

second.py contents:
import modules.foo as foo # will import "/path/to/the/myproject/" + "modules/foo.py"

EDIT:

You can run the following command to print a sorted list of all built-in module names. These are the things that load before ANY custom files/module folders in your projects. Basically these are names you must avoid in your own custom files:

python -c "import sys, json; print(json.dumps(sorted(list(sys.modules.keys())), indent=4))"

List as of Python 3.9.0:

"__main__",
"_abc",
"_bootlocale",
"_codecs",
"_collections",
"_collections_abc",
"_frozen_importlib",
"_frozen_importlib_external",
"_functools",
"_heapq",
"_imp",
"_io",
"_json",
"_locale",
"_operator",
"_signal",
"_sitebuiltins",
"_sre",
"_stat",
"_thread",
"_warnings",
"_weakref",
"abc",
"builtins",
"codecs",
"collections",
"copyreg",
"encodings",
"encodings.aliases",
"encodings.cp1252",
"encodings.latin_1",
"encodings.utf_8",
"enum",
"functools",
"genericpath",
"heapq",
"io",
"itertools",
"json",
"json.decoder",
"json.encoder",
"json.scanner",
"keyword",
"marshal",
"nt",
"ntpath",
"operator",
"os",
"os.path",
"pywin32_bootstrap",
"re",
"reprlib",
"site",
"sre_compile",
"sre_constants",
"sre_parse",
"stat",
"sys",
"time",
"types",
"winreg",
"zipimport"

So NEVER use any of those names for you .py files or your project module subfolders.

Answered By: Mitch McMabers

after importing a module, python first searches from sys.modules list of directories.
if it is not found, then it searches from sys.path list of directories. There might be other lists python search for on your operating system

import time , sys
print (sys.modules)
print (sys.path)

output is lists of directories:

{... , ... , .....}
['C:\Users\****', 'C:\****', ....']

time module is imported in accordance with the order of sys.modules and sys.path lists.

Answered By: Mohsen Haddadi

Even though the above answers regarding the order in which the interpreter scans sys.path are correct, giving precedence to e.g. user file paths over site-packages deployed packages might fail if the full user path is not available in the PYTHONPATH variable.

For example, imagine you have the following structure of namespace packages:

/opt/repo_root
  - project  # this is the base package that brigns structure to the namespace hierarchy
  - my_pkg
  - my_pkg-core
  - my_pkg-gui
  - my_pkg-helpers
  - my_pkg-helpers-time_sync

The above packages all have the internal needed structure and metadata in order to be deployable by conda, and these are also all installed. Therefore, I can open a python shell and type:

>>> from project.my_pkg.helpers import time_sync
>>> print(time_sync.__file__)

/python/interpreter/path/lib/python3.6/site_packages/project/my_pkg/helpers/time_sync/__init__.py

will return some path in the python interpreter’s site-packages subfolder. If I manually add the package to be imported to PYTHONPATH or even to sys.path, nothing will change.

>>> import os

>>> # joining separator ":" for Unix, ";" for NT
>>> os.environ['PYTHONPATH'] = ":".join(os.environ['PYTHONPATH'], "/opt/repo_root/my_pkg-helpers-time_sync")

>>> from project.my_pkg.helpers import time_sync
>>> print(time_sync.__file__)

/python/interpreter/path/lib/python3.6/site_packages/project/my_pkg/helpers/time_sync/__init__.py

still returns that the package has been imported from site-packages. You need to include the whole hierarchy of paths into PYTHONPATH, as if it was a traditional python package, and then it will work as you expect:

>>> import os

>>> # joining separator ":" for Unix, ";" for NT
>>> os.environ['PYTHONPATH'] = ":".join(
... os.environ['PYTHONPATH'],
... "/opt/repo_root",
... "/opt/repo_root/project",
... "/opt/repo_root/project/my_pkg",
... "/opt/repo_root/project/my_pkg-helpers",
... "/opt/repo_root/project/my_pkg-helpers-time_sync"
... )

>>> from project.my_pkg.helpers import time_sync
>>> print(time_sync.__file__)

/opt/project/my_pkg/helpers/time_sync/__init__.py
Answered By: mosegui
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.