How to memoize **kwargs?
Question:
I haven’t seen an established way to memoize a function that takes key-word arguments, i.e. something of type
def f(*args, **kwargs)
since typically a memoizer has a dict
to cache results for a given set of input parameters, and kwargs
is a dict
and hence unhashable. I have tried, following discussions here, using
(args, frozenset(kwargs.items()))
as key to the cache dict
, but this only works if the values in kwargs
are hashable. Furthermore, as pointed out in answers below is that frozenset
is not an ordered data structure. Therefore this solution might be safer:
(args, tuple(sorted(kwargs.items())))
But it still cannot cope with un-hashable elements. Another approach I have seen is to use a string
representation of the kwargs
in the cache key:
(args, str(sorted(kwargs.items())))
The only drawback I see with this is the overhead of hashing a potentially very long string. As far as I can see the results should be correct. Can anyone spot any problems with the latter approach? One of the answers below points out that this assumes certain behaviour of the __str__
or __repr__
functions for the values of the key-word arguments. This seems like a show-stopper.
Is there another, more established way of achieving memoization that can cope with **kwargs
and un-hashable argument values?
Answers:
dicts can be in arbitrary order, so there’s no guarantee that the latter will work. Use sorted(kwargs.items())
to get it sorted by key first.
key = (args, frozenset(kwargs.items()))
This is the "best" you can do without making assumptions about your data.
However it seems conceivable to want to perform memoization on dictionaries (a bit unusual though), you could special-case that if you desired it. For example you could recursively apply frozenset(---.items())
while copying dictionaries.
If you do sorted
, you could be in a bad situation where you have unorderable keys. For example, "The subset and equality comparisons do not generalize to a complete ordering function. For example, any two disjoint sets are not equal and are not subsets of each other, so all of the following return False: a<b, a==b, or a>b. Accordingly, sets do not implement the cmp() method."
>>> sorted([frozenset({1,2}), frozenset({1,3})])
[frozenset({1, 2}), frozenset({1, 3})]
>>> sorted([frozenset({1,3}), frozenset({1,2})]) # THE SAME
[frozenset({1, 3}), frozenset({1, 2})] # DIFFERENT SORT RESULT
# sorted(stuff) != sorted(reversed(stuff)), if not strictly totally ordered
edit: Ignacio says "While you can’t use sorted() on arbitrary dicts, kwargs will have str keys." This is entirely correct. Thus this is not an issue for keys, though possibly something to keep in mind for values if you (or unlikely repr) are relying on sorting somehow.
Regarding using str
:
It is the case most data will work nicely, but it is possible for an adversary (e.g. in a security-vulnerability context) to craft a collision. It’s not easy mind you because most default repr
s use lots of good grouping and escape. In fact I was not able to find such a collision. But it is possible with sloppy third-party or incomplete repr
implementations.
Also consider the following: If you are storing keys like ((<map object at 0x1377d50>,), frozenset(...))
and ((<list_iterator object at 0x1377dd0>,<list_iterator object at 0x1377dd0>), frozenset(...))
, your cache will grow unboundedly just by calling the same items. (You could perhaps work around this issue with a regex…) And attempting to consume the generators will mess up the semantics of the function you’re using. This may be desired behavior though if you wish to memoize on is
-style equality rather than ==
-style equality.
Also doing something like str({1:object()})
in the interpreter will return an object at the same location in memory each time! I think this is the garbage collector at work. This would be disastrous, because if you happen to be hashing <some object at 0x???????>
and you happen to create an object of the same type at the same memory location later on (due to garbage collection), you will get incorrect results from the memoized function. As mentioned, one possibly really hackish workaround is to detect such objects with a regex.
What about key = pickle.dumps( (args, sorted(kwargs.items()), -1 )
?
This would appear to be a more robust approach than str() or repr().
It’s similar to what EMS said, but the best way would be:
key = cPickle.dumps((*args, **kwargs))
I’ve been doing a lot of research and testing for memorization with decorators, and this is the best method I’ve found so far.
Here:
from functools import wraps
def memoize(fun):
"""A simple memoize decorator for functions supporting positional args."""
@wraps(fun)
def wrapper(*args, **kwargs):
key = (args, frozenset(sorted(kwargs.items())))
try:
return cache[key]
except KeyError:
ret = cache[key] = fun(*args, **kwargs)
return ret
cache = {}
return wrapper
Tests:
import unittest
class TestMemoize(unittest.TestCase):
def test_it(self):
@memoize
def foo(*args, **kwargs):
"foo docstring"
calls.append(None)
return (args, kwargs)
calls = []
# no args
for x in range(2):
ret = foo()
expected = ((), {})
self.assertEqual(ret, expected)
self.assertEqual(len(calls), 1)
# with args
for x in range(2):
ret = foo(1)
expected = ((1, ), {})
self.assertEqual(ret, expected)
self.assertEqual(len(calls), 2)
# with args + kwargs
for x in range(2):
ret = foo(1, bar=2)
expected = ((1, ), {'bar': 2})
self.assertEqual(ret, expected)
self.assertEqual(len(calls), 3)
self.assertEqual(foo.__doc__, "foo docstring")
unittest.main()
This works as long as you don’t pass an unhashable type (e.g. dict) as argument.
I don’t have a solution for that but collections.lru_cache() implementation might have.
See _make_key() function here:
http://code.activestate.com/recipes/578078/
I haven’t seen an established way to memoize a function that takes key-word arguments, i.e. something of type
def f(*args, **kwargs)
since typically a memoizer has a dict
to cache results for a given set of input parameters, and kwargs
is a dict
and hence unhashable. I have tried, following discussions here, using
(args, frozenset(kwargs.items()))
as key to the cache dict
, but this only works if the values in kwargs
are hashable. Furthermore, as pointed out in answers below is that frozenset
is not an ordered data structure. Therefore this solution might be safer:
(args, tuple(sorted(kwargs.items())))
But it still cannot cope with un-hashable elements. Another approach I have seen is to use a string
representation of the kwargs
in the cache key:
(args, str(sorted(kwargs.items())))
The only drawback I see with this is the overhead of hashing a potentially very long string. As far as I can see the results should be correct. Can anyone spot any problems with the latter approach? One of the answers below points out that this assumes certain behaviour of the __str__
or __repr__
functions for the values of the key-word arguments. This seems like a show-stopper.
Is there another, more established way of achieving memoization that can cope with **kwargs
and un-hashable argument values?
dicts can be in arbitrary order, so there’s no guarantee that the latter will work. Use sorted(kwargs.items())
to get it sorted by key first.
key = (args, frozenset(kwargs.items()))
This is the "best" you can do without making assumptions about your data.
However it seems conceivable to want to perform memoization on dictionaries (a bit unusual though), you could special-case that if you desired it. For example you could recursively apply frozenset(---.items())
while copying dictionaries.
If you do sorted
, you could be in a bad situation where you have unorderable keys. For example, "The subset and equality comparisons do not generalize to a complete ordering function. For example, any two disjoint sets are not equal and are not subsets of each other, so all of the following return False: a<b, a==b, or a>b. Accordingly, sets do not implement the cmp() method."
>>> sorted([frozenset({1,2}), frozenset({1,3})])
[frozenset({1, 2}), frozenset({1, 3})]
>>> sorted([frozenset({1,3}), frozenset({1,2})]) # THE SAME
[frozenset({1, 3}), frozenset({1, 2})] # DIFFERENT SORT RESULT
# sorted(stuff) != sorted(reversed(stuff)), if not strictly totally ordered
edit: Ignacio says "While you can’t use sorted() on arbitrary dicts, kwargs will have str keys." This is entirely correct. Thus this is not an issue for keys, though possibly something to keep in mind for values if you (or unlikely repr) are relying on sorting somehow.
Regarding using str
:
It is the case most data will work nicely, but it is possible for an adversary (e.g. in a security-vulnerability context) to craft a collision. It’s not easy mind you because most default repr
s use lots of good grouping and escape. In fact I was not able to find such a collision. But it is possible with sloppy third-party or incomplete repr
implementations.
Also consider the following: If you are storing keys like ((<map object at 0x1377d50>,), frozenset(...))
and ((<list_iterator object at 0x1377dd0>,<list_iterator object at 0x1377dd0>), frozenset(...))
, your cache will grow unboundedly just by calling the same items. (You could perhaps work around this issue with a regex…) And attempting to consume the generators will mess up the semantics of the function you’re using. This may be desired behavior though if you wish to memoize on is
-style equality rather than ==
-style equality.
Also doing something like str({1:object()})
in the interpreter will return an object at the same location in memory each time! I think this is the garbage collector at work. This would be disastrous, because if you happen to be hashing <some object at 0x???????>
and you happen to create an object of the same type at the same memory location later on (due to garbage collection), you will get incorrect results from the memoized function. As mentioned, one possibly really hackish workaround is to detect such objects with a regex.
What about key = pickle.dumps( (args, sorted(kwargs.items()), -1 )
?
This would appear to be a more robust approach than str() or repr().
It’s similar to what EMS said, but the best way would be:
key = cPickle.dumps((*args, **kwargs))
I’ve been doing a lot of research and testing for memorization with decorators, and this is the best method I’ve found so far.
Here:
from functools import wraps
def memoize(fun):
"""A simple memoize decorator for functions supporting positional args."""
@wraps(fun)
def wrapper(*args, **kwargs):
key = (args, frozenset(sorted(kwargs.items())))
try:
return cache[key]
except KeyError:
ret = cache[key] = fun(*args, **kwargs)
return ret
cache = {}
return wrapper
Tests:
import unittest
class TestMemoize(unittest.TestCase):
def test_it(self):
@memoize
def foo(*args, **kwargs):
"foo docstring"
calls.append(None)
return (args, kwargs)
calls = []
# no args
for x in range(2):
ret = foo()
expected = ((), {})
self.assertEqual(ret, expected)
self.assertEqual(len(calls), 1)
# with args
for x in range(2):
ret = foo(1)
expected = ((1, ), {})
self.assertEqual(ret, expected)
self.assertEqual(len(calls), 2)
# with args + kwargs
for x in range(2):
ret = foo(1, bar=2)
expected = ((1, ), {'bar': 2})
self.assertEqual(ret, expected)
self.assertEqual(len(calls), 3)
self.assertEqual(foo.__doc__, "foo docstring")
unittest.main()
This works as long as you don’t pass an unhashable type (e.g. dict) as argument.
I don’t have a solution for that but collections.lru_cache() implementation might have.
See _make_key() function here:
http://code.activestate.com/recipes/578078/