How to test if a file has been created by pickle?

Question:

Is there any way of checking if a file has been created by pickle? I could just catch exceptions thrown by pickle.load but there is no specific “not a pickle file” exception.

Asked By: Erik

||

Answers:

There is no sure way other than to try to unpickle it, and catch exceptions.

Answered By: Ned Batchelder

Pickle files don’t have a header, so there’s no standard way of identifying them short of trying to unpickle one and seeing if any exceptions are raised while doing so.

You could define your own enhanced protocol that included some kind of header by subclassing the Pickler() and Unpickler() classes in the pickle module. However this can’t be done with the much faster cPickle module because, in it, they’re factory functions, which can’t be subclassed [1].

A more flexible approach would be define your own independent classes that used corresponding Pickler() and Unpickler() instances from either one of these modules in its implementation.

Update

The last byte of all pickle files should be the pickle.STOP opcode, so while there isn’t a header, there is effectively a very minimal trailer which would be a relatively simple thing to check.

Depending on your exact usage, you might be able to get away with supplementing that with something more elaborate (and longer than one byte), since any data past the STOP opcode in a pickled object’s representation is ignored [2].

[1]  Footnote [2] in the Python 2 documentation.
[2]  Documentation forpickle.loads(), which also applies to pickle.load()since it’s currently implemented in terms of the former.
Answered By: martineau

I was running into this issue and found a fairly decent way of doing it. You can use the built in pickletools module to deconstruct a pickle file and get the pickle operations. With pickle protocol v2 and higher the first opcode will be a PROTO name and the last one as @martineau mentioned is STOP the following code will display these two opcodes. Note that output in this example can be iterated but opcodes can not be directly accessed thus the for loop.

import pickletools

with open("file.pickle", "rb") as f:
    pickle = f.read()
    output = pickletools.genops(pickle)
    opcodes = []
    for opcode in output:
        opcodes.append(opcode[0])
    print(opcodes[0].name)
    print(opcodes[-1].name)
Answered By: TylerRajotte
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.