Python pickling after changing a module's directory

Question:

I’ve recently changed my program’s directory layout: before, I had all my modules inside the “main” folder. Now, I’ve moved them into a directory named after the program, and placed an __init__.py there to make a package.

Now I have a single .py file in my main directory that is used to launch my program, which is much neater.

Anyway, trying to load in pickled files from previous versions of my program is failing. I’m getting, “ImportError: No module named tools” – which I guess is because my module was previously in the main folder, and now it’s in whyteboard.tools, not simply plain tools. However, the code that is importing in the tools module lives in the same directory as it, so I doubt there’s a need to specify a package.

So, my program directory looks something like this:

whyteboard-0.39.4

-->whyteboard.py

-->README.txt

-->CHANGELOG.txt

---->whyteboard/

---->whyteboard/__init__.py

---->whyteboard/gui.py

---->whyteboard/tools.py

whyteboard.py launches a block of code from whyteboard/gui.py, that fires up the GUI. This pickling problem definitely wasn’t happening before the directory re-organizing.

Asked By: Steven Sproat

||

Answers:

This is the normal behavior of pickle, unpickled objects need to have their defining module importable.

You should be able to change the modules path (i.e. from tools to whyteboard.tools) by editing the pickled files, as they are normally simple text files.

Answered By: Luper Rouch

As pickle’s docs say, in order to save and restore a class instance (actually a function, too), you must respect certain constraints:

pickle can save and restore class
instances transparently, however the
class definition must be importable
and live in the same module as when
the object was stored

whyteboard.tools is not the “the same module as” tools (even though it can be imported by import tools by other modules in the same package, it ends up in sys.modules as sys.modules['whyteboard.tools']: this is absolutely crucial, otherwise the same module imported by one in the same package vs one in another package would end up with multiple and possibly conflicting entries!).

If your pickle files are in a good/advanced format (as opposed to the old ascii format that’s the default only for compatibility reasons), migrating them once you perform such changes may in fact not be quite as trivial as “editing the file” (which is binary &c…!), despite what another answer suggests. I suggest that, instead, you make a little “pickle-migrating script”: let it patch sys.modules like this…:

import sys
from whyteboard import tools

sys.modules['tools'] = tools

and then cPickle.load each file, del sys.modules['tools'], and cPickle.dump each loaded object back to file: that temporary extra entry in sys.modules should let the pickles load successfully, then dumping them again should be using the right module-name for the instances’ classes (removing that extra entry should make sure of that).

Answered By: Alex Martelli

pickle serializes classes by reference, so if you change were the class lives, it will not unpickle because the class will not be found. If you use dill instead of pickle, then you can serialize classes by reference or directly (by directly serializing the class instead of it’s import path). You simulate this pretty easily by just changing the class definition after a dump and before a load.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> class Foo(object):
...   def bar(self):
...     return 5
... 
>>> f = Foo()
>>> 
>>> _f = dill.dumps(f)
>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x
... 
>>> g = Foo()
>>> f_ = dill.loads(_f)
>>> f_.bar()
5
>>> g.bar(4)
4
Answered By: Mike McKerns

Happened to me, solved it by adding the new location of the module to sys.path before loading pickle:

import sys
sys.path.append('path/to/whiteboard')
f = open("pickled_file", "rb")
pickle.load(f)
Answered By: Ranch

This can be done with a custom “unpickler” that uses find_class():

import io
import pickle


class RenameUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        renamed_module = module
        if module == "tools":
            renamed_module = "whyteboard.tools"

        return super(RenameUnpickler, self).find_class(renamed_module, name)


def renamed_load(file_obj):
    return RenameUnpickler(file_obj).load()


def renamed_loads(pickled_bytes):
    file_obj = io.BytesIO(pickled_bytes)
    return renamed_load(file_obj)

Then you’d need to use renamed_load() instead of pickle.load() and renamed_loads() instead of pickle.loads().

Answered By: bossylobster

When you try to load a pickle file that contain a class reference, you must respect the same structure when you saved the pickle. If you want use the pickle somewhere else, you have to tell where this class or other object is; so do this below you can save the day:

import sys
sys.path.append('path/to/folder containing the python module')
Answered By: SimoX

For people like me needing to update lots of pickle dumps, here’s a function implementing @Alex Martelli’s excellent advice:

import sys
from types import ModuleType
import pickle

# import torch

def update_module_path_in_pickled_object(
    pickle_path: str, old_module_path: str, new_module: ModuleType
) -> None:
    """Update a python module's dotted path in a pickle dump if the
    corresponding file was renamed.

    Implements the advice in https://stackoverflow.com/a/2121918.

    Args:
        pickle_path (str): Path to the pickled object.
        old_module_path (str): The old.dotted.path.to.renamed.module.
        new_module (ModuleType): from new.location import module.
    """
    sys.modules[old_module_path] = new_module

    dic = pickle.load(open(pickle_path, "rb"))
    # dic = torch.load(pickle_path, map_location="cpu")

    del sys.modules[old_module_path]

    pickle.dump(dic, open(pickle_path, "wb"))
    # torch.save(dic, pickle_path)

In my case, the dumps were PyTorch model checkpoints. Hence the commented-out torch.load/save().

Example

from new.location import new_module

for pickle_path in ('foo.pkl', 'bar.pkl'):
    update_module_path_in_pickled_object(
        pickle_path, "old.module.dotted.path", new_module
    )
Answered By: Casimir
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.