python lazy variables? or, delayed expensive computation

Question:

I have a set of arrays that are very large and expensive to compute, and not all will necessarily be needed by my code on any given run. I would like to make their declaration optional, but ideally without having to rewrite my whole code.

Example of how it is now:

x = function_that_generates_huge_array_slowly(0)
y = function_that_generates_huge_array_slowly(1)

Example of what I’d like to do:

x = lambda: function_that_generates_huge_array_slowly(0)
y = lambda: function_that_generates_huge_array_slowly(1)
z = x * 5 # this doesn't work because lambda is a function
      # is there something that would make this line behave like
      # z = x() * 5?
g = x * 6

While using lambda as above achieves one of the desired effects – computation of the array is delayed until it is needed – if you use the variable “x” more than once, it has to be computed each time. I’d like to compute it only once.

EDIT:
After some additional searching, it looks like it is possible to do what I want (approximately) with “lazy” attributes in a class (e.g. http://code.activestate.com/recipes/131495-lazy-attributes/). I don’t suppose there’s any way to do something similar without making a separate class?

EDIT2: I’m trying to implement some of the solutions, but I’m running in to an issue because I don’t understand the difference between:

class sample(object):
    def __init__(self):
        class one(object):
            def __get__(self, obj, type=None):
                print "computing ..."
                obj.one = 1
                return 1
        self.one = one()

and

class sample(object):
    class one(object):
        def __get__(self, obj, type=None):
            print "computing ... "
            obj.one = 1
            return 1
    one = one()

I think some variation on these is what I’m looking for, since the expensive variables are intended to be part of a class.

Asked By: keflavich

||

Answers:

The first half of your problem (reusing the value) is easily solved:

class LazyWrapper(object):
    def __init__(self, func):
        self.func = func
        self.value = None
    def __call__(self):
        if self.value is None:
            self.value = self.func()
        return self.value

lazy_wrapper = LazyWrapper(lambda: function_that_generates_huge_array_slowly(0))

But you still have to use it as lazy_wrapper() not lazy_wrapper.

If you’re going to be accessing some of the variables many times, it may be faster to use:

class LazyWrapper(object):
    def __init__(self, func):
        self.func = func
    def __call__(self):
        try:
            return self.value
        except AttributeError:
            self.value = self.func()
            return self.value

Which will make the first call slower and subsequent uses faster.

Edit: I see you found a similar solution that requires you to use attributes on a class. Either way requires you rewrite every lazy variable access, so just pick whichever you like.

Edit 2: You can also do:

class YourClass(object)
    def __init__(self, func):
        self.func = func
    @property
    def x(self):
        try:
            return self.value
        except AttributeError:
            self.value = self.func()
            return self.value

If you want to access x as an instance attribute. No additional class is needed. If you don’t want to change the class signature (by making it require func), you can hard code the function call into the property.

Answered By: agf

Writing a class is more robust, but optimizing for simplicity (which I think you are asking for), I came up with the following solution:

cache = {}

def expensive_calc(factor):
    print 'calculating...'
    return [1, 2, 3] * factor

def lookup(name):
    return ( cache[name] if name in cache
        else cache.setdefault(name, expensive_calc(2)) )

print 'run one'
print lookup('x') * 2

print 'run two'
print lookup('x') * 2
Answered By: Gringo Suave

You can’t make a simple name, like x, to really evaluate lazily. A name is just an entry in a hash table (e.g. in that which locals() or globals() return). Unless you patch access methods of these system tables, you cannot attach execution of your code to simple name resolution.

But you can wrap functions in caching wrappers in different ways.
This is an OO way:

class CachedSlowCalculation(object):
    cache = {} # our results

    def __init__(self, func):
        self.func = func

    def __call__(self, param):
        already_known = self.cache.get(param, None)
        if already_known:
            return already_known
        value = self.func(param)
        self.cache[param] = value
        return value

calc = CachedSlowCalculation(function_that_generates_huge_array_slowly)

z = calc(1) + calc(1)**2 # only calculates things once

This is a classless way:

def cached(func):
    func.__cache = {} # we can attach attrs to objects, functions are objects
    def wrapped(param):
        cache = func.__cache
        already_known = cache.get(param, None)
        if already_known:
            return already_known
        value = func(param)
        cache[param] = value
        return value
    return wrapped

@cached
def f(x):
    print "I'm being called with %r" % x
    return x + 1

z = f(9) + f(9)**2 # see f called only once

In real world you’ll add some logic to keep the cache to a reasonable size, possibly using a LRU algorithm.

Answered By: 9000

Python 3.2 and greater implement an LRU algorithm in the functools module to handle simple cases of caching/memoization:

import functools

@functools.lru_cache(maxsize=128) #cache at most 128 items
def f(x):
    print("I'm being called with %r" % x)
    return x + 1

z = f(9) + f(9)**2
Answered By: Kevin

To me, it seems that the proper solution for your problem is subclassing a dict and using it.

class LazyDict(dict):
    def __init__(self, lazy_variables):
        self.lazy_vars = lazy_variables
    def __getitem__(self, key):
        if key not in self and key in self.lazy_vars:
            self[key] = self.lazy_vars[key]()
        return super().__getitem__(key)

def generate_a():
    print("generate var a lazily..")
    return "<a_large_array>"

# You can add as many variables as you want here
lazy_vars = {'a': generate_a}

lazy = LazyDict(lazy_vars)

# retrieve the variable you need from `lazy`
a = lazy['a']
print("Got a:", a)

And you can actually evaluate a variable lazily if you use exec to run your code. The solution is just using a custom globals.

your_code = "print('inside exec');print(a)"
exec(your_code, lazy)

If you did your_code = open(your_file).read(), you could actually run your code and achieve what you want. But I think the more practical approach would be the former one.

Answered By: bombs
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.