Lazy module variables–can it be done?

Question:

I’m trying to find a way to lazily load a module-level variable.

Specifically, I’ve written a tiny Python library to talk to iTunes, and I want to have a DOWNLOAD_FOLDER_PATH module variable. Unfortunately, iTunes won’t tell you where its download folder is, so I’ve written a function that grabs the filepath of a few podcast tracks and climbs back up the directory tree until it finds the “Downloads” directory.

This takes a second or two, so I’d like to have it evaluated lazily, rather than at module import time.

Is there any way to lazily assign a module variable when it’s first accessed or will I have to rely on a function?

Asked By: wbg

||

Answers:

If that variable lived in a class rather than a module, then you could overload getattr, or better yet, populate it in init.

Answered By: Sean Cavanagh

Is there any way to lazily assign a module variable when it’s first accessed or will I have to rely on a function?

I think you are correct in saying that a function is the best solution to your problem here.
I will give you a brief example to illustrate.

#myfile.py - an example module with some expensive module level code.

import os
# expensive operation to crawl up in directory structure

The expensive operation will be executed on import if it is at module level. There is not a way to stop this, short of lazily importing the entire module!!

#myfile2.py - a module with expensive code placed inside a function.

import os

def getdownloadsfolder(curdir=None):
    """a function that will search upward from the user's current directory
        to find the 'Downloads' folder."""
    # expensive operation now here.

You will be following best practice by using this method.

Answered By: Simon Edwards

You can’t do it with modules, but you can disguise a class “as if” it was a module, e.g., in itun.py, code…:

import sys

class _Sneaky(object):
  def __init__(self):
    self.download = None

  @property
  def DOWNLOAD_PATH(self):
    if not self.download:
      self.download = heavyComputations()
    return self.download

  def __getattr__(self, name):
    return globals()[name]

# other parts of itun that you WANT to code in
# module-ish ways

sys.modules[__name__] = _Sneaky()

Now anybody can import itun… and get in fact your itun._Sneaky() instance. The __getattr__ is there to let you access anything else in itun.py that may be more convenient for you to code as a top-level module object, than inside _Sneaky!_)

Answered By: Alex Martelli

I used Alex’ implementation on Python 3.3, but this crashes miserably:
The code

  def __getattr__(self, name):
    return globals()[name]

is not correct because an AttributeError should be raised, not a KeyError.
This crashed immediately under Python 3.3, because a lot of introspection is done
during the import, looking for attributes like __path__, __loader__ etc.

Here is the version that we use now in our project to allow for lazy imports
in a module. The __init__ of the module is delayed until the first attribute access
that has not a special name:

""" config.py """
# lazy initialization of this module to avoid circular import.
# the trick is to replace this module by an instance!
# modelled after a post from Alex Martelli :-)

Lazy module variables–can it be done?

class _Sneaky(object):
    def __init__(self, name):
        self.module = sys.modules[name]
        sys.modules[name] = self
        self.initializing = True

    def __getattr__(self, name):
        # call module.__init__ after import introspection is done
        if self.initializing and not name[:2] == '__' == name[-2:]:
            self.initializing = False
            __init__(self.module)
        return getattr(self.module, name)

_Sneaky(__name__)

The module now needs to define an init function. This function can be used
to import modules that might import ourselves:

def __init__(module):
    ...
    # do something that imports config.py again
    ...

The code can be put into another module, and it can be extended with properties
as in the examples above.

Maybe that is useful for somebody.

Answered By: Christian Tismer

Recently I came across the same problem, and have found a way to do it.

class LazyObject(object):
    def __init__(self):
        self.initialized = False
        setattr(self, 'data', None)

    def init(self, *args):
        #print 'initializing'
        pass

    def __len__(self): return len(self.data)
    def __repr__(self): return repr(self.data)

    def __getattribute__(self, key):
        if object.__getattribute__(self, 'initialized') == False:
            object.__getattribute__(self, 'init')(self)
            setattr(self, 'initialized', True)

        if key == 'data':
            return object.__getattribute__(self, 'data')
        else:
            try:
                return object.__getattribute__(self, 'data').__getattribute__(key)
            except AttributeError:
                return super(LazyObject, self).__getattribute__(key)

With this LazyObject, You can define a init method for the object, and the object will be initialized lazily, example code looks like:

o = LazyObject()
def slow_init(self):
    time.sleep(1) # simulate slow initialization
    self.data = 'done'
o.init = slow_init

the o object above will have exactly the same methods whatever 'done' object have, for example, you can do:

# o will be initialized, then apply the `len` method 
assert len(o) == 4

complete code with tests (works in 2.7) can be found here:

https://gist.github.com/observerss/007fedc5b74c74f3ea08

Answered By: jingchao

For Python 3.5 and 3.6, the proper way of doing this, according to the Python docs, is to subclass types.ModuleType and then dynamically update the module’s __class__. So, here’s a solution loosely on Christian Tismer’s answer but probably not resembling it much at all:

import sys
import types

class _Sneaky(types.ModuleType):
    @property
    def DOWNLOAD_FOLDER_PATH(self):
        if not hasattr(self, '_download_folder_path'):
            self._download_folder_path = '/dev/block/'
        return self._download_folder_path
sys.modules[__name__].__class__ = _Sneaky

For Python 3.7 and later, you can define a module-level __getattr__() function. See PEP 562 for details.

Answered By: wizzwizz4

It turns out that as of Python 3.7, it’s possible to do this cleanly by defining a __getattr__() at the module level, as specified in PEP 562 and documented in the data model chapter in the Python reference documentation.

# mymodule.py

from typing import Any

DOWNLOAD_FOLDER_PATH: str

def _download_folder_path() -> str:
    global DOWNLOAD_FOLDER_PATH
    DOWNLOAD_FOLDER_PATH = ... # compute however ...
    return DOWNLOAD_FOLDER_PATH

def __getattr__(name: str) -> Any:
    if name == "DOWNLOAD_FOLDER_PATH":
        return _download_folder_path()
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
Answered By: Dan Walters

Since Python 3.7 (and as a result of PEP-562), this is now possible with the module-level __getattr__:

Inside your module, put something like:

def _long_function():
    # print() function to show this is called only once
    print("Determining DOWNLOAD_FOLDER_PATH...")
    # Determine the module-level variable
    path = "/some/path/here"
    # Set the global (module scope)
    globals()['DOWNLOAD_FOLDER_PATH'] = path
    # ... and return it
    return path


def __getattr__(name):
    if name == "DOWNLOAD_FOLDER_PATH":
        return _long_function()

    # Implicit else
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

From this it should be clear that the _long_function() isn’t executed when you import your module, e.g.:

print("-- before import --")
import somemodule
print("-- after import --")

results in just:

-- before import --
-- after import --

But when you attempt to access the name from the module, the module-level __getattr__ will be called, which in turn will call _long_function, which will perform the long-running task, cache it as a module-level variable, and return the result back to the code that called it.

For example, with the first block above inside the module “somemodule.py”, the following code:

import somemodule
print("--")
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')
print(somemodule.DOWNLOAD_FOLDER_PATH)
print('--')

produces:

--
Determining DOWNLOAD_FOLDER_PATH...
/some/path/here
--
/some/path/here
--

or, more clearly:

# LINE OF CODE                                # OUTPUT
import somemodule                             # (nothing)

print("--")                                   # --

print(somemodule.DOWNLOAD_FOLDER_PATH)        # Determining DOWNLOAD_FOLDER_PATH...
                                              # /some/path/here

print("--")                                   # --

print(somemodule.DOWNLOAD_FOLDER_PATH)        # /some/path/here

print("--")                                   # --

Lastly, you can also implement __dir__ as the PEP describes if you want to indicate (e.g. to code introspection tools) that DOWNLOAD_FOLDER_PATH is available.

Answered By: jedwards

SPEC 1

Probably the best known recipe for Lazy Loading module attributes (and modules) is in SPEC 1 (Draft) at scientific-python.org. SPECs are operational guidelines for projects in the Scientific Python ecosystem. There is discussion around the SPEC 1 at Scientific Python Discourse and the solution is offered as a package in PyPI as lazy_loader. The lazy_loader implementation relies on the module __gettattr__ support introduced in Python 3.7 (PEP 562), and it is used in scikit-image, NetworkX, and partially in Scipy

Example usage:

The following example is using the lazy_loader PyPI package. You could also just copy-paste the source code to be part of your project.

# mypackage/__init__.py
import lazy_loader

__getattr__, __dir__, __all__ = lazy_loader.attach(
    __name__,
    submodules=['bar'],
    submod_attrs={
        'foo.morefoo': ['FooFilter', 'do_foo', 'MODULE_VARIABLE'],
        'grok.spam': ['spam_a', 'spam_b', 'spam_c']
    }
)

this is the lazy import equivalent of

from . import bar
from .foo.morefoo import FooFilter, do_foo, MODULE_VARIABLE
from .grok.spam import (spam_a, spam_b, spam_c)

Short explanation on lazy_loader.attach

  • If you want to lazy-load a module, you list it in submodules (which is a list)
  • If you want to lazy-load something from a module (function, class, etc.), you list it in submod_attrs (which is a dict)

Type checking

Static type checkers and IDEs cannot infer type information from lazily loaded imports. As workaround, you may use type stubs (.pyi files), like this:

# mypackage/__init__.pyi
from .foo.morefoo import FooFilter as FooFilter, do_foo as do_foo, MODULE_VARIABLE as MODULE_VARIABLE
from .grok.spam import spam_a as spam_a, spam_b as spam_b, spam_c as spam_c

The SPEC 1 mentions that this X as X syntax is necessary due to PEP484.

Side notes

  • There was recently a PEP for Lazy Imports, PEP 690, but it was rejected.
  • In Tensorflow, there is lazy loading class at util.lazyloader.
  • There is one blog post from Brett Cannon (a Python core developer), where he showed in 2018 a module __getattr__ based implementation of lazy_loader, and provided it in a package called modutil, but the project is marked archived in GitHub. This has been an inspiration for the scientific-python lazy_loader.
Answered By: np8