python lazy variables? or, delayed expensive computation
Question:
I have a set of arrays that are very large and expensive to compute, and not all will necessarily be needed by my code on any given run. I would like to make their declaration optional, but ideally without having to rewrite my whole code.
Example of how it is now:
x = function_that_generates_huge_array_slowly(0)
y = function_that_generates_huge_array_slowly(1)
Example of what I’d like to do:
x = lambda: function_that_generates_huge_array_slowly(0)
y = lambda: function_that_generates_huge_array_slowly(1)
z = x * 5 # this doesn't work because lambda is a function
# is there something that would make this line behave like
# z = x() * 5?
g = x * 6
While using lambda as above achieves one of the desired effects – computation of the array is delayed until it is needed – if you use the variable “x” more than once, it has to be computed each time. I’d like to compute it only once.
EDIT:
After some additional searching, it looks like it is possible to do what I want (approximately) with “lazy” attributes in a class (e.g. http://code.activestate.com/recipes/131495-lazy-attributes/). I don’t suppose there’s any way to do something similar without making a separate class?
EDIT2: I’m trying to implement some of the solutions, but I’m running in to an issue because I don’t understand the difference between:
class sample(object):
def __init__(self):
class one(object):
def __get__(self, obj, type=None):
print "computing ..."
obj.one = 1
return 1
self.one = one()
and
class sample(object):
class one(object):
def __get__(self, obj, type=None):
print "computing ... "
obj.one = 1
return 1
one = one()
I think some variation on these is what I’m looking for, since the expensive variables are intended to be part of a class.
Answers:
The first half of your problem (reusing the value) is easily solved:
class LazyWrapper(object):
def __init__(self, func):
self.func = func
self.value = None
def __call__(self):
if self.value is None:
self.value = self.func()
return self.value
lazy_wrapper = LazyWrapper(lambda: function_that_generates_huge_array_slowly(0))
But you still have to use it as lazy_wrapper()
not lazy_wrapper
.
If you’re going to be accessing some of the variables many times, it may be faster to use:
class LazyWrapper(object):
def __init__(self, func):
self.func = func
def __call__(self):
try:
return self.value
except AttributeError:
self.value = self.func()
return self.value
Which will make the first call slower and subsequent uses faster.
Edit: I see you found a similar solution that requires you to use attributes on a class. Either way requires you rewrite every lazy variable access, so just pick whichever you like.
Edit 2: You can also do:
class YourClass(object)
def __init__(self, func):
self.func = func
@property
def x(self):
try:
return self.value
except AttributeError:
self.value = self.func()
return self.value
If you want to access x
as an instance attribute. No additional class is needed. If you don’t want to change the class signature (by making it require func
), you can hard code the function call into the property.
Writing a class is more robust, but optimizing for simplicity (which I think you are asking for), I came up with the following solution:
cache = {}
def expensive_calc(factor):
print 'calculating...'
return [1, 2, 3] * factor
def lookup(name):
return ( cache[name] if name in cache
else cache.setdefault(name, expensive_calc(2)) )
print 'run one'
print lookup('x') * 2
print 'run two'
print lookup('x') * 2
You can’t make a simple name, like x
, to really evaluate lazily. A name is just an entry in a hash table (e.g. in that which locals()
or globals()
return). Unless you patch access methods of these system tables, you cannot attach execution of your code to simple name resolution.
But you can wrap functions in caching wrappers in different ways.
This is an OO way:
class CachedSlowCalculation(object):
cache = {} # our results
def __init__(self, func):
self.func = func
def __call__(self, param):
already_known = self.cache.get(param, None)
if already_known:
return already_known
value = self.func(param)
self.cache[param] = value
return value
calc = CachedSlowCalculation(function_that_generates_huge_array_slowly)
z = calc(1) + calc(1)**2 # only calculates things once
This is a classless way:
def cached(func):
func.__cache = {} # we can attach attrs to objects, functions are objects
def wrapped(param):
cache = func.__cache
already_known = cache.get(param, None)
if already_known:
return already_known
value = func(param)
cache[param] = value
return value
return wrapped
@cached
def f(x):
print "I'm being called with %r" % x
return x + 1
z = f(9) + f(9)**2 # see f called only once
In real world you’ll add some logic to keep the cache to a reasonable size, possibly using a LRU algorithm.
Python 3.2 and greater implement an LRU algorithm in the functools module to handle simple cases of caching/memoization:
import functools
@functools.lru_cache(maxsize=128) #cache at most 128 items
def f(x):
print("I'm being called with %r" % x)
return x + 1
z = f(9) + f(9)**2
To me, it seems that the proper solution for your problem is subclassing a dict and using it.
class LazyDict(dict):
def __init__(self, lazy_variables):
self.lazy_vars = lazy_variables
def __getitem__(self, key):
if key not in self and key in self.lazy_vars:
self[key] = self.lazy_vars[key]()
return super().__getitem__(key)
def generate_a():
print("generate var a lazily..")
return "<a_large_array>"
# You can add as many variables as you want here
lazy_vars = {'a': generate_a}
lazy = LazyDict(lazy_vars)
# retrieve the variable you need from `lazy`
a = lazy['a']
print("Got a:", a)
And you can actually evaluate a variable lazily if you use exec
to run your code. The solution is just using a custom globals.
your_code = "print('inside exec');print(a)"
exec(your_code, lazy)
If you did your_code = open(your_file).read()
, you could actually run your code and achieve what you want. But I think the more practical approach would be the former one.
I have a set of arrays that are very large and expensive to compute, and not all will necessarily be needed by my code on any given run. I would like to make their declaration optional, but ideally without having to rewrite my whole code.
Example of how it is now:
x = function_that_generates_huge_array_slowly(0)
y = function_that_generates_huge_array_slowly(1)
Example of what I’d like to do:
x = lambda: function_that_generates_huge_array_slowly(0)
y = lambda: function_that_generates_huge_array_slowly(1)
z = x * 5 # this doesn't work because lambda is a function
# is there something that would make this line behave like
# z = x() * 5?
g = x * 6
While using lambda as above achieves one of the desired effects – computation of the array is delayed until it is needed – if you use the variable “x” more than once, it has to be computed each time. I’d like to compute it only once.
EDIT:
After some additional searching, it looks like it is possible to do what I want (approximately) with “lazy” attributes in a class (e.g. http://code.activestate.com/recipes/131495-lazy-attributes/). I don’t suppose there’s any way to do something similar without making a separate class?
EDIT2: I’m trying to implement some of the solutions, but I’m running in to an issue because I don’t understand the difference between:
class sample(object):
def __init__(self):
class one(object):
def __get__(self, obj, type=None):
print "computing ..."
obj.one = 1
return 1
self.one = one()
and
class sample(object):
class one(object):
def __get__(self, obj, type=None):
print "computing ... "
obj.one = 1
return 1
one = one()
I think some variation on these is what I’m looking for, since the expensive variables are intended to be part of a class.
The first half of your problem (reusing the value) is easily solved:
class LazyWrapper(object):
def __init__(self, func):
self.func = func
self.value = None
def __call__(self):
if self.value is None:
self.value = self.func()
return self.value
lazy_wrapper = LazyWrapper(lambda: function_that_generates_huge_array_slowly(0))
But you still have to use it as lazy_wrapper()
not lazy_wrapper
.
If you’re going to be accessing some of the variables many times, it may be faster to use:
class LazyWrapper(object):
def __init__(self, func):
self.func = func
def __call__(self):
try:
return self.value
except AttributeError:
self.value = self.func()
return self.value
Which will make the first call slower and subsequent uses faster.
Edit: I see you found a similar solution that requires you to use attributes on a class. Either way requires you rewrite every lazy variable access, so just pick whichever you like.
Edit 2: You can also do:
class YourClass(object)
def __init__(self, func):
self.func = func
@property
def x(self):
try:
return self.value
except AttributeError:
self.value = self.func()
return self.value
If you want to access x
as an instance attribute. No additional class is needed. If you don’t want to change the class signature (by making it require func
), you can hard code the function call into the property.
Writing a class is more robust, but optimizing for simplicity (which I think you are asking for), I came up with the following solution:
cache = {}
def expensive_calc(factor):
print 'calculating...'
return [1, 2, 3] * factor
def lookup(name):
return ( cache[name] if name in cache
else cache.setdefault(name, expensive_calc(2)) )
print 'run one'
print lookup('x') * 2
print 'run two'
print lookup('x') * 2
You can’t make a simple name, like x
, to really evaluate lazily. A name is just an entry in a hash table (e.g. in that which locals()
or globals()
return). Unless you patch access methods of these system tables, you cannot attach execution of your code to simple name resolution.
But you can wrap functions in caching wrappers in different ways.
This is an OO way:
class CachedSlowCalculation(object):
cache = {} # our results
def __init__(self, func):
self.func = func
def __call__(self, param):
already_known = self.cache.get(param, None)
if already_known:
return already_known
value = self.func(param)
self.cache[param] = value
return value
calc = CachedSlowCalculation(function_that_generates_huge_array_slowly)
z = calc(1) + calc(1)**2 # only calculates things once
This is a classless way:
def cached(func):
func.__cache = {} # we can attach attrs to objects, functions are objects
def wrapped(param):
cache = func.__cache
already_known = cache.get(param, None)
if already_known:
return already_known
value = func(param)
cache[param] = value
return value
return wrapped
@cached
def f(x):
print "I'm being called with %r" % x
return x + 1
z = f(9) + f(9)**2 # see f called only once
In real world you’ll add some logic to keep the cache to a reasonable size, possibly using a LRU algorithm.
Python 3.2 and greater implement an LRU algorithm in the functools module to handle simple cases of caching/memoization:
import functools
@functools.lru_cache(maxsize=128) #cache at most 128 items
def f(x):
print("I'm being called with %r" % x)
return x + 1
z = f(9) + f(9)**2
To me, it seems that the proper solution for your problem is subclassing a dict and using it.
class LazyDict(dict):
def __init__(self, lazy_variables):
self.lazy_vars = lazy_variables
def __getitem__(self, key):
if key not in self and key in self.lazy_vars:
self[key] = self.lazy_vars[key]()
return super().__getitem__(key)
def generate_a():
print("generate var a lazily..")
return "<a_large_array>"
# You can add as many variables as you want here
lazy_vars = {'a': generate_a}
lazy = LazyDict(lazy_vars)
# retrieve the variable you need from `lazy`
a = lazy['a']
print("Got a:", a)
And you can actually evaluate a variable lazily if you use exec
to run your code. The solution is just using a custom globals.
your_code = "print('inside exec');print(a)"
exec(your_code, lazy)
If you did your_code = open(your_file).read()
, you could actually run your code and achieve what you want. But I think the more practical approach would be the former one.