How can I recover a corrupted, partially pickled file?

Question:

My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.

Is it possible to partially or fully recover the data? If so, how?

Here’s what I’ve tried:

>>> dill.load(open(filename, 'rb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
    obj = pik.load()
EOFError: Ran out of input
>>> 

The file is not empty:

>>> os.stat(filename).st_size
31110059

Note: all data in the dictionary was comprised of python built-in types.

Asked By: eqzx

||

Answers:

The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:

import io
import pickle

# Use the pure-Python version, we can't see the internal state of the C version
pickle.Unpickler = pickle._Unpickler

import dill

if __name__ == '__main__':
    obj = [1, 2, {3: 4, "5": ('6',)}]
    data = dill.dumps(obj)

    handle = io.BytesIO(data[:-5])  # cut it off

    unpickler = dill.Unpickler(handle)

    try:
        unpickler.load()
    except EOFError:
        pass

    print(unpickler.stack)

I get the following output:

[3, 4, '5', ('6',)]

The pickle data format isn’t that complicated. Read through the Python module’s source code and you can probably find a way to hook all of the load_ methods to give you more information.

Answered By: Blender

I can’t comment on the above answer, but to extend Blender’s answer:

unpickler.metastack worked for me, dill v0.3.5.1 (though you could do it without dill, afaik). stack did exist, but was an empty list.

Also, with dill I got a UnpicklingError rather than EOFError. This could also be partly because of how my file got corrupted (ran out of disk space)

Answered By: Steveo
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.