Store the cache to a file functools.lru_cache in Python >= 3.2

Question:

I’m using @functools.lru_cache in Python 3.3. I would like to save the cache to a file, in order to restore it when the program will be restarted. How could I do?

Edit 1 Possible solution: We need to pickle any sort of callable

Problem pickling __closure__:

_pickle.PicklingError: Can't pickle <class 'cell'>: attribute lookup builtins.cell failed

If I try to restore the function without it, I get:

TypeError: arg 5 (closure) must be tuple

Answers:

You are not supposed to touch anything inside the decorator implementation except for the public API so if you want to change its behavior you probably need to copy its implementation and add necessary functions yourself. Note that the cache is currently stored as a circular doubly linked list so you will need to take care when saving and loading it.

Answered By: wRAR

You can’t do what you want using lru_cache, since it doesn’t provide an API to access the cache, and it might be rewritten in C in future releases. If you really want to save the cache you have to use a different solution that gives you access to the cache.

It’s simple enough to write a cache yourself. For example:

from functools import wraps

def cached(func):
    func.cache = {}
    @wraps(func)
    def wrapper(*args):
        try:
            return func.cache[args]
        except KeyError:
            func.cache[args] = result = func(*args)
            return result   
    return wrapper

You can then apply it as a decorator:

>>> @cached
... def fibonacci(n):
...     if n < 2:
...             return n
...     return fibonacci(n-1) + fibonacci(n-2)
... 
>>> fibonacci(100)
354224848179261915075L

And retrieve the cache:

>>> fibonacci.cache
{(32,): 2178309, (23,): 28657, ... }

You can then pickle/unpickle the cache as you please and load it with:

fibonacci.cache = pickle.load(cache_file_object)

I found a feature request in python’s issue tracker to add dumps/loads to lru_cache, but it wasn’t accepted/implemented. Maybe in the future it will be possible to have built-in support for these operations via lru_cache.

Answered By: Bakuriu

Consider using joblib.Memory for persistent caching to the disk.

Since the disk is enormous, there’s no need for an LRU caching scheme.

Answered By: Will

You can use a library of mine, mezmorize

import random
from mezmorize import Cache

cache = Cache(CACHE_TYPE='filesystem', CACHE_DIR='cache')


@cache.memoize()
def add(a, b):
    return a + b + random.randrange(0, 1000)

>>> add(2, 5)
727
>>> add(2, 5)
727
Answered By: reubano

This is something that I wrote might be helpful devcache.

It’s designed to help you speed up iterations for long running methods. It’s configurable with a config file

@devcache(group='crm')
def my_method(a, b, c):  
    ...        

@devcache(group='db')
def another_method(a, b, c): 
    ...        

The cache can be refreshed or used with a yaml config file like:

refresh: false # refresh true will ignore use_cache and refresh all cached data 
props:
    1:
        group: crm
        use_cache: false
    2:
        group: db
        use_cache: true

Would refresh the cache for my_method and use the cache for another_method.

It’s not going to help you pickle the the callable but it does the caching part and would be straight forward to modify the code to add specialized serialization.

Answered By: pcauthorn

If your use-case is to cache the result of computationally intensive functions in your pytest test suites, pytest already has a file-based cache. See the docs for more info.

This being said, I had a few extra requirements:

  1. I wanted to be able to call the cached function directly in the test instead of from a fixture
  2. I wanted to cache complex python objects, not just simple python primitives/containers
  3. I wanted an implementation that could refresh the cache intelligently (or be forced to invalidate only a single key)

Thus I came up with my own wrapper for the pytest cache, which you
can find below. The implementation is fully documented, but if you
need more info let me know and I’ll be happy to edit this answer 🙂

Enjoy:

from base64 import b64encode, b64decode
import hashlib
import inspect
import pickle
from typing import Any, Optional

import pytest

__all__ = ['cached']

@pytest.fixture
def cached(request):
    def _cached(func: callable, *args, _invalidate_cache: bool = False, _refresh_key: Optional[Any] = None, **kwargs):
        """Caches the result of func(*args, **kwargs) cross-testrun.
        Cache invalidation can be performed by passing _invalidate_cache=True or a _refresh_key can
        be passed for improved control on invalidation policy.

        For example, given a function that executes a side effect such as querying a database:

            result = query(sql)
        
        can be cached as follows:

            refresh_key = query(sql=fast_refresh_sql)
            result = cached(query, sql=slow_or_expensive_sql, _refresh_key=refresh_key)

        or can be directly invalidated if you are doing rapid iteration of your test:

            result = cached(query, sql=sql, _invalidate_cache=True)
        
        Args:
            func (callable): Callable that will be called
            _invalidate_cache (bool, optional): Whether or not to invalidate_cache. Defaults to False.
            _refresh_key (Optional[Any], optional): Refresh key to provide a programmatic way to invalidate cache. Defaults to None.
            *args: Positional args to pass to func
            **kwargs: Keyword args to pass to func

        Returns:
            _type_: _description_
        """
        # get debug info
        # see https://stackoverflow.com/a/24439444/4442749
        try:
            func_name = getattr(func, '__name__', repr(func))
        except:
            func_name = '<function>'
        try:
            caller = inspect.getframeinfo(inspect.stack()[1][0])
        except:
            func_name = '<file>:<lineno>'
        
        call_key = _create_call_key(func, None, *args, **kwargs)

        cached_value = request.config.cache.get(call_key, {"refresh_key": None, "value": None})
        value = cached_value["value"]

        current_refresh_key = str(b64encode(pickle.dumps(_refresh_key)), encoding='utf8')
        cached_refresh_key = cached_value.get("refresh_key")

        if (
            _invalidate_cache # force invalidate
            or cached_refresh_key is None # first time caching this call
            or current_refresh_key != cached_refresh_key # refresh_key has changed
        ):
            print("Cache invalidated for '%s' @ %s:%d" % (func_name, caller.filename, caller.lineno))
            result = func(*args, **kwargs)
            value = str(b64encode(pickle.dumps(result)), encoding='utf8')
            request.config.cache.set(
                key=call_key,
                value={
                    "refresh_key": current_refresh_key,
                    "value": value
                }
            )
        else:
            print("Cache hit for '%s' @ %s:%d" % (func_name, caller.filename, caller.lineno))
            result = pickle.loads(b64decode(bytes(value, encoding='utf8')))
        return result
    return _cached

_args_marker = object()
_kwargs_marker = object()

def _create_call_key(func: callable, refresh_key: Any, *args, **kwargs):
    """Produces a hex hash str of the call func(*args, **kwargs)"""
    # producing a key from func + args
    # see https://stackoverflow.com/a/10220908/4442749
    call_key = pickle.dumps(
        (func, refresh_key) +
        (_args_marker, ) +
        tuple(args) +
        (_kwargs_marker,) +
        tuple(sorted(kwargs.items()))
    )
    # create a hex digest of the key for the filename
    m = hashlib.sha256()
    m.update(bytes(call_key))
    return m.digest().hex()
    
Answered By: Philippe Hebert
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.