Lazy loading of class attributes
Question:
Class Foo
has a bar
, and it is not loaded until it is accessed. Further accesses to bar
should incur no overhead.
class Foo(object):
def get_bar(self):
print "initializing"
self.bar = "12345"
self.get_bar = self._get_bar
return self.bar
def _get_bar(self):
print "accessing"
return self.bar
Is it possible to do something like this using properties or, better yet, attributes, instead of using a getter method?
The goal is to lazy load without overhead on all subsequent accesses…
Answers:
Sure, just have your property set an instance attribute that is returned on subsequent access:
class Foo(object):
_cached_bar = None
@property
def bar(self):
if not self._cached_bar:
self._cached_bar = self._get_expensive_bar_expression()
return self._cached_bar
The property
descriptor is a data descriptor (it implements __get__
, __set__
and __delete__
descriptor hooks), so it’ll be invoked even if a bar
attribute exists on the instance, with the end result that Python ignores that attribute, hence the need to test for a separate attribute on each access.
You can write your own descriptor that only implements __get__
, at which point Python uses an attribute on the instance over the descriptor if it exists:
class CachedProperty(object):
def __init__(self, func, name=None):
self.func = func
self.name = name if name is not None else func.__name__
self.__doc__ = func.__doc__
def __get__(self, instance, class_):
if instance is None:
return self
res = self.func(instance)
setattr(instance, self.name, res)
return res
class Foo(object):
@CachedProperty
def bar(self):
return self._get_expensive_bar_expression()
If you prefer a __getattr__
approach (which has something to say for it), that’d be:
class Foo(object):
def __getattr__(self, name):
if name == 'bar':
bar = self.bar = self._get_expensive_bar_expression()
return bar
return super(Foo, self).__getattr__(name)
Subsequent access will find the bar
attribute on the instance and __getattr__
won’t be consulted.
Demo:
>>> class FooExpensive(object):
... def _get_expensive_bar_expression(self):
... print 'Doing something expensive'
... return 'Spam ham & eggs'
...
>>> class FooProperty(FooExpensive):
... _cached_bar = None
... @property
... def bar(self):
... if not self._cached_bar:
... self._cached_bar = self._get_expensive_bar_expression()
... return self._cached_bar
...
>>> f = FooProperty()
>>> f.bar
Doing something expensive
'Spam ham & eggs'
>>> f.bar
'Spam ham & eggs'
>>> vars(f)
{'_cached_bar': 'Spam ham & eggs'}
>>> class FooDescriptor(FooExpensive):
... bar = CachedProperty(FooExpensive._get_expensive_bar_expression, 'bar')
...
>>> f = FooDescriptor()
>>> f.bar
Doing something expensive
'Spam ham & eggs'
>>> f.bar
'Spam ham & eggs'
>>> vars(f)
{'bar': 'Spam ham & eggs'}
>>> class FooGetAttr(FooExpensive):
... def __getattr__(self, name):
... if name == 'bar':
... bar = self.bar = self._get_expensive_bar_expression()
... return bar
... return super(Foo, self).__getatt__(name)
...
>>> f = FooGetAttr()
>>> f.bar
Doing something expensive
'Spam ham & eggs'
>>> f.bar
'Spam ham & eggs'
>>> vars(f)
{'bar': 'Spam ham & eggs'}
Sure it is, try:
class Foo(object):
def __init__(self):
self._bar = None # Initial value
@property
def bar(self):
if self._bar is None:
self._bar = HeavyObject()
return self._bar
Note that this is not thread-safe. cPython has GIL, so it’s a relative issue, but if you plan to use this in a true multithread Python stack (say, Jython), you might want to implement some form of lock safety.
There are some problems with the current answers. The solution with a property requires that you specify an additional class attribute and has the overhead of checking this attribute on each look up. The solution with __getattr__
has the issue that it hides this attribute until first access. This is bad for introspection and a workaround with __dir__
is inconvenient.
A better solution than the two proposed ones is utilizing descriptors directly. The werkzeug library has already a solution as werkzeug.utils.cached_property
. It has a simple implementation so you can directly use it without having Werkzeug as dependency:
_missing = object()
class cached_property(object):
"""A decorator that converts a function into a lazy property. The
function wrapped is called the first time to retrieve the result
and then that calculated result is used the next time you access
the value::
class Foo(object):
@cached_property
def foo(self):
# calculate something important here
return 42
The class has to have a `__dict__` in order for this property to
work.
"""
# implementation detail: this property is implemented as non-data
# descriptor. non-data descriptors are only invoked if there is
# no entry with the same name in the instance's __dict__.
# this allows us to completely get rid of the access function call
# overhead. If one choses to invoke __get__ by hand the property
# will still work as expected because the lookup logic is replicated
# in __get__ for manual invocation.
def __init__(self, func, name=None, doc=None):
self.__name__ = name or func.__name__
self.__module__ = func.__module__
self.__doc__ = doc or func.__doc__
self.func = func
def __get__(self, obj, type=None):
if obj is None:
return self
value = obj.__dict__.get(self.__name__, _missing)
if value is _missing:
value = self.func(obj)
obj.__dict__[self.__name__] = value
return value
Class Foo
has a bar
, and it is not loaded until it is accessed. Further accesses to bar
should incur no overhead.
class Foo(object):
def get_bar(self):
print "initializing"
self.bar = "12345"
self.get_bar = self._get_bar
return self.bar
def _get_bar(self):
print "accessing"
return self.bar
Is it possible to do something like this using properties or, better yet, attributes, instead of using a getter method?
The goal is to lazy load without overhead on all subsequent accesses…
Sure, just have your property set an instance attribute that is returned on subsequent access:
class Foo(object):
_cached_bar = None
@property
def bar(self):
if not self._cached_bar:
self._cached_bar = self._get_expensive_bar_expression()
return self._cached_bar
The property
descriptor is a data descriptor (it implements __get__
, __set__
and __delete__
descriptor hooks), so it’ll be invoked even if a bar
attribute exists on the instance, with the end result that Python ignores that attribute, hence the need to test for a separate attribute on each access.
You can write your own descriptor that only implements __get__
, at which point Python uses an attribute on the instance over the descriptor if it exists:
class CachedProperty(object):
def __init__(self, func, name=None):
self.func = func
self.name = name if name is not None else func.__name__
self.__doc__ = func.__doc__
def __get__(self, instance, class_):
if instance is None:
return self
res = self.func(instance)
setattr(instance, self.name, res)
return res
class Foo(object):
@CachedProperty
def bar(self):
return self._get_expensive_bar_expression()
If you prefer a __getattr__
approach (which has something to say for it), that’d be:
class Foo(object):
def __getattr__(self, name):
if name == 'bar':
bar = self.bar = self._get_expensive_bar_expression()
return bar
return super(Foo, self).__getattr__(name)
Subsequent access will find the bar
attribute on the instance and __getattr__
won’t be consulted.
Demo:
>>> class FooExpensive(object):
... def _get_expensive_bar_expression(self):
... print 'Doing something expensive'
... return 'Spam ham & eggs'
...
>>> class FooProperty(FooExpensive):
... _cached_bar = None
... @property
... def bar(self):
... if not self._cached_bar:
... self._cached_bar = self._get_expensive_bar_expression()
... return self._cached_bar
...
>>> f = FooProperty()
>>> f.bar
Doing something expensive
'Spam ham & eggs'
>>> f.bar
'Spam ham & eggs'
>>> vars(f)
{'_cached_bar': 'Spam ham & eggs'}
>>> class FooDescriptor(FooExpensive):
... bar = CachedProperty(FooExpensive._get_expensive_bar_expression, 'bar')
...
>>> f = FooDescriptor()
>>> f.bar
Doing something expensive
'Spam ham & eggs'
>>> f.bar
'Spam ham & eggs'
>>> vars(f)
{'bar': 'Spam ham & eggs'}
>>> class FooGetAttr(FooExpensive):
... def __getattr__(self, name):
... if name == 'bar':
... bar = self.bar = self._get_expensive_bar_expression()
... return bar
... return super(Foo, self).__getatt__(name)
...
>>> f = FooGetAttr()
>>> f.bar
Doing something expensive
'Spam ham & eggs'
>>> f.bar
'Spam ham & eggs'
>>> vars(f)
{'bar': 'Spam ham & eggs'}
Sure it is, try:
class Foo(object):
def __init__(self):
self._bar = None # Initial value
@property
def bar(self):
if self._bar is None:
self._bar = HeavyObject()
return self._bar
Note that this is not thread-safe. cPython has GIL, so it’s a relative issue, but if you plan to use this in a true multithread Python stack (say, Jython), you might want to implement some form of lock safety.
There are some problems with the current answers. The solution with a property requires that you specify an additional class attribute and has the overhead of checking this attribute on each look up. The solution with __getattr__
has the issue that it hides this attribute until first access. This is bad for introspection and a workaround with __dir__
is inconvenient.
A better solution than the two proposed ones is utilizing descriptors directly. The werkzeug library has already a solution as werkzeug.utils.cached_property
. It has a simple implementation so you can directly use it without having Werkzeug as dependency:
_missing = object()
class cached_property(object):
"""A decorator that converts a function into a lazy property. The
function wrapped is called the first time to retrieve the result
and then that calculated result is used the next time you access
the value::
class Foo(object):
@cached_property
def foo(self):
# calculate something important here
return 42
The class has to have a `__dict__` in order for this property to
work.
"""
# implementation detail: this property is implemented as non-data
# descriptor. non-data descriptors are only invoked if there is
# no entry with the same name in the instance's __dict__.
# this allows us to completely get rid of the access function call
# overhead. If one choses to invoke __get__ by hand the property
# will still work as expected because the lookup logic is replicated
# in __get__ for manual invocation.
def __init__(self, func, name=None, doc=None):
self.__name__ = name or func.__name__
self.__module__ = func.__module__
self.__doc__ = doc or func.__doc__
self.func = func
def __get__(self, obj, type=None):
if obj is None:
return self
value = obj.__dict__.get(self.__name__, _missing)
if value is _missing:
value = self.func(obj)
obj.__dict__[self.__name__] = value
return value