Store the cache to a file functools.lru_cache in Python >= 3.2
Question:
I’m using @functools.lru_cache
in Python 3.3. I would like to save the cache to a file, in order to restore it when the program will be restarted. How could I do?
Edit 1 Possible solution: We need to pickle any sort of callable
Problem pickling __closure__
:
_pickle.PicklingError: Can't pickle <class 'cell'>: attribute lookup builtins.cell failed
If I try to restore the function without it, I get:
TypeError: arg 5 (closure) must be tuple
Answers:
You are not supposed to touch anything inside the decorator implementation except for the public API so if you want to change its behavior you probably need to copy its implementation and add necessary functions yourself. Note that the cache is currently stored as a circular doubly linked list so you will need to take care when saving and loading it.
You can’t do what you want using lru_cache
, since it doesn’t provide an API to access the cache, and it might be rewritten in C in future releases. If you really want to save the cache you have to use a different solution that gives you access to the cache.
It’s simple enough to write a cache yourself. For example:
from functools import wraps
def cached(func):
func.cache = {}
@wraps(func)
def wrapper(*args):
try:
return func.cache[args]
except KeyError:
func.cache[args] = result = func(*args)
return result
return wrapper
You can then apply it as a decorator:
>>> @cached
... def fibonacci(n):
... if n < 2:
... return n
... return fibonacci(n-1) + fibonacci(n-2)
...
>>> fibonacci(100)
354224848179261915075L
And retrieve the cache
:
>>> fibonacci.cache
{(32,): 2178309, (23,): 28657, ... }
You can then pickle/unpickle the cache as you please and load it with:
fibonacci.cache = pickle.load(cache_file_object)
I found a feature request in python’s issue tracker to add dumps/loads to lru_cache
, but it wasn’t accepted/implemented. Maybe in the future it will be possible to have built-in support for these operations via lru_cache
.
Consider using joblib.Memory for persistent caching to the disk.
Since the disk is enormous, there’s no need for an LRU caching scheme.
You can use a library of mine, mezmorize
import random
from mezmorize import Cache
cache = Cache(CACHE_TYPE='filesystem', CACHE_DIR='cache')
@cache.memoize()
def add(a, b):
return a + b + random.randrange(0, 1000)
>>> add(2, 5)
727
>>> add(2, 5)
727
This is something that I wrote might be helpful devcache.
It’s designed to help you speed up iterations for long running methods. It’s configurable with a config file
@devcache(group='crm')
def my_method(a, b, c):
...
@devcache(group='db')
def another_method(a, b, c):
...
The cache can be refreshed or used with a yaml config file like:
refresh: false # refresh true will ignore use_cache and refresh all cached data
props:
1:
group: crm
use_cache: false
2:
group: db
use_cache: true
Would refresh the cache for my_method
and use the cache for another_method
.
It’s not going to help you pickle the the callable but it does the caching part and would be straight forward to modify the code to add specialized serialization.
If your use-case is to cache the result of computationally intensive functions in your pytest test suites, pytest already has a file-based cache. See the docs for more info.
This being said, I had a few extra requirements:
- I wanted to be able to call the cached function directly in the test instead of from a fixture
- I wanted to cache complex python objects, not just simple python primitives/containers
- I wanted an implementation that could refresh the cache intelligently (or be forced to invalidate only a single key)
Thus I came up with my own wrapper for the pytest cache, which you
can find below. The implementation is fully documented, but if you
need more info let me know and I’ll be happy to edit this answer 🙂
Enjoy:
from base64 import b64encode, b64decode
import hashlib
import inspect
import pickle
from typing import Any, Optional
import pytest
__all__ = ['cached']
@pytest.fixture
def cached(request):
def _cached(func: callable, *args, _invalidate_cache: bool = False, _refresh_key: Optional[Any] = None, **kwargs):
"""Caches the result of func(*args, **kwargs) cross-testrun.
Cache invalidation can be performed by passing _invalidate_cache=True or a _refresh_key can
be passed for improved control on invalidation policy.
For example, given a function that executes a side effect such as querying a database:
result = query(sql)
can be cached as follows:
refresh_key = query(sql=fast_refresh_sql)
result = cached(query, sql=slow_or_expensive_sql, _refresh_key=refresh_key)
or can be directly invalidated if you are doing rapid iteration of your test:
result = cached(query, sql=sql, _invalidate_cache=True)
Args:
func (callable): Callable that will be called
_invalidate_cache (bool, optional): Whether or not to invalidate_cache. Defaults to False.
_refresh_key (Optional[Any], optional): Refresh key to provide a programmatic way to invalidate cache. Defaults to None.
*args: Positional args to pass to func
**kwargs: Keyword args to pass to func
Returns:
_type_: _description_
"""
# get debug info
# see https://stackoverflow.com/a/24439444/4442749
try:
func_name = getattr(func, '__name__', repr(func))
except:
func_name = '<function>'
try:
caller = inspect.getframeinfo(inspect.stack()[1][0])
except:
func_name = '<file>:<lineno>'
call_key = _create_call_key(func, None, *args, **kwargs)
cached_value = request.config.cache.get(call_key, {"refresh_key": None, "value": None})
value = cached_value["value"]
current_refresh_key = str(b64encode(pickle.dumps(_refresh_key)), encoding='utf8')
cached_refresh_key = cached_value.get("refresh_key")
if (
_invalidate_cache # force invalidate
or cached_refresh_key is None # first time caching this call
or current_refresh_key != cached_refresh_key # refresh_key has changed
):
print("Cache invalidated for '%s' @ %s:%d" % (func_name, caller.filename, caller.lineno))
result = func(*args, **kwargs)
value = str(b64encode(pickle.dumps(result)), encoding='utf8')
request.config.cache.set(
key=call_key,
value={
"refresh_key": current_refresh_key,
"value": value
}
)
else:
print("Cache hit for '%s' @ %s:%d" % (func_name, caller.filename, caller.lineno))
result = pickle.loads(b64decode(bytes(value, encoding='utf8')))
return result
return _cached
_args_marker = object()
_kwargs_marker = object()
def _create_call_key(func: callable, refresh_key: Any, *args, **kwargs):
"""Produces a hex hash str of the call func(*args, **kwargs)"""
# producing a key from func + args
# see https://stackoverflow.com/a/10220908/4442749
call_key = pickle.dumps(
(func, refresh_key) +
(_args_marker, ) +
tuple(args) +
(_kwargs_marker,) +
tuple(sorted(kwargs.items()))
)
# create a hex digest of the key for the filename
m = hashlib.sha256()
m.update(bytes(call_key))
return m.digest().hex()
I’m using @functools.lru_cache
in Python 3.3. I would like to save the cache to a file, in order to restore it when the program will be restarted. How could I do?
Edit 1 Possible solution: We need to pickle any sort of callable
Problem pickling __closure__
:
_pickle.PicklingError: Can't pickle <class 'cell'>: attribute lookup builtins.cell failed
If I try to restore the function without it, I get:
TypeError: arg 5 (closure) must be tuple
You are not supposed to touch anything inside the decorator implementation except for the public API so if you want to change its behavior you probably need to copy its implementation and add necessary functions yourself. Note that the cache is currently stored as a circular doubly linked list so you will need to take care when saving and loading it.
You can’t do what you want using lru_cache
, since it doesn’t provide an API to access the cache, and it might be rewritten in C in future releases. If you really want to save the cache you have to use a different solution that gives you access to the cache.
It’s simple enough to write a cache yourself. For example:
from functools import wraps
def cached(func):
func.cache = {}
@wraps(func)
def wrapper(*args):
try:
return func.cache[args]
except KeyError:
func.cache[args] = result = func(*args)
return result
return wrapper
You can then apply it as a decorator:
>>> @cached
... def fibonacci(n):
... if n < 2:
... return n
... return fibonacci(n-1) + fibonacci(n-2)
...
>>> fibonacci(100)
354224848179261915075L
And retrieve the cache
:
>>> fibonacci.cache
{(32,): 2178309, (23,): 28657, ... }
You can then pickle/unpickle the cache as you please and load it with:
fibonacci.cache = pickle.load(cache_file_object)
I found a feature request in python’s issue tracker to add dumps/loads to lru_cache
, but it wasn’t accepted/implemented. Maybe in the future it will be possible to have built-in support for these operations via lru_cache
.
Consider using joblib.Memory for persistent caching to the disk.
Since the disk is enormous, there’s no need for an LRU caching scheme.
You can use a library of mine, mezmorize
import random
from mezmorize import Cache
cache = Cache(CACHE_TYPE='filesystem', CACHE_DIR='cache')
@cache.memoize()
def add(a, b):
return a + b + random.randrange(0, 1000)
>>> add(2, 5)
727
>>> add(2, 5)
727
This is something that I wrote might be helpful devcache.
It’s designed to help you speed up iterations for long running methods. It’s configurable with a config file
@devcache(group='crm')
def my_method(a, b, c):
...
@devcache(group='db')
def another_method(a, b, c):
...
The cache can be refreshed or used with a yaml config file like:
refresh: false # refresh true will ignore use_cache and refresh all cached data
props:
1:
group: crm
use_cache: false
2:
group: db
use_cache: true
Would refresh the cache for my_method
and use the cache for another_method
.
It’s not going to help you pickle the the callable but it does the caching part and would be straight forward to modify the code to add specialized serialization.
If your use-case is to cache the result of computationally intensive functions in your pytest test suites, pytest already has a file-based cache. See the docs for more info.
This being said, I had a few extra requirements:
- I wanted to be able to call the cached function directly in the test instead of from a fixture
- I wanted to cache complex python objects, not just simple python primitives/containers
- I wanted an implementation that could refresh the cache intelligently (or be forced to invalidate only a single key)
Thus I came up with my own wrapper for the pytest cache, which you
can find below. The implementation is fully documented, but if you
need more info let me know and I’ll be happy to edit this answer 🙂
Enjoy:
from base64 import b64encode, b64decode
import hashlib
import inspect
import pickle
from typing import Any, Optional
import pytest
__all__ = ['cached']
@pytest.fixture
def cached(request):
def _cached(func: callable, *args, _invalidate_cache: bool = False, _refresh_key: Optional[Any] = None, **kwargs):
"""Caches the result of func(*args, **kwargs) cross-testrun.
Cache invalidation can be performed by passing _invalidate_cache=True or a _refresh_key can
be passed for improved control on invalidation policy.
For example, given a function that executes a side effect such as querying a database:
result = query(sql)
can be cached as follows:
refresh_key = query(sql=fast_refresh_sql)
result = cached(query, sql=slow_or_expensive_sql, _refresh_key=refresh_key)
or can be directly invalidated if you are doing rapid iteration of your test:
result = cached(query, sql=sql, _invalidate_cache=True)
Args:
func (callable): Callable that will be called
_invalidate_cache (bool, optional): Whether or not to invalidate_cache. Defaults to False.
_refresh_key (Optional[Any], optional): Refresh key to provide a programmatic way to invalidate cache. Defaults to None.
*args: Positional args to pass to func
**kwargs: Keyword args to pass to func
Returns:
_type_: _description_
"""
# get debug info
# see https://stackoverflow.com/a/24439444/4442749
try:
func_name = getattr(func, '__name__', repr(func))
except:
func_name = '<function>'
try:
caller = inspect.getframeinfo(inspect.stack()[1][0])
except:
func_name = '<file>:<lineno>'
call_key = _create_call_key(func, None, *args, **kwargs)
cached_value = request.config.cache.get(call_key, {"refresh_key": None, "value": None})
value = cached_value["value"]
current_refresh_key = str(b64encode(pickle.dumps(_refresh_key)), encoding='utf8')
cached_refresh_key = cached_value.get("refresh_key")
if (
_invalidate_cache # force invalidate
or cached_refresh_key is None # first time caching this call
or current_refresh_key != cached_refresh_key # refresh_key has changed
):
print("Cache invalidated for '%s' @ %s:%d" % (func_name, caller.filename, caller.lineno))
result = func(*args, **kwargs)
value = str(b64encode(pickle.dumps(result)), encoding='utf8')
request.config.cache.set(
key=call_key,
value={
"refresh_key": current_refresh_key,
"value": value
}
)
else:
print("Cache hit for '%s' @ %s:%d" % (func_name, caller.filename, caller.lineno))
result = pickle.loads(b64decode(bytes(value, encoding='utf8')))
return result
return _cached
_args_marker = object()
_kwargs_marker = object()
def _create_call_key(func: callable, refresh_key: Any, *args, **kwargs):
"""Produces a hex hash str of the call func(*args, **kwargs)"""
# producing a key from func + args
# see https://stackoverflow.com/a/10220908/4442749
call_key = pickle.dumps(
(func, refresh_key) +
(_args_marker, ) +
tuple(args) +
(_kwargs_marker,) +
tuple(sorted(kwargs.items()))
)
# create a hex digest of the key for the filename
m = hashlib.sha256()
m.update(bytes(call_key))
return m.digest().hex()