Why is a pickled object with slots bigger than one without slots?

Question:

I’m working on a program that keeps dying because of the OOM killer. I was hoping for some quick wins in reducing the memory usage without a major refactor. I tried adding __slots__ to the most common classes but I noticed the pickled size went up. Why is that?

class Class:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c


class ClassSlots:
    __slots__ = ["a", "b", "c"]

    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

cases = [
    Class(1, 2, 3),
    ClassSlots(1, 2, 3),
    [Class(1, 2, 3) for _ in range(1000)],
    [ClassSlots(1, 2, 3) for _ in range(1000)]
]

for case in cases:
    dump = pickle.dumps(case, protocol=5)
    print(len(dump))

with Python 3.10 prints

59
67
22041
25046
Asked By: parched

||

Answers:

So, on Python 3.11, let’s define the following:

class Foo:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c


class Bar:
    __slots__ = ["a", "b", "c"]
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

Now, let’s see:

>>> import pickle
>>> import pickletools
>>> len(pickle.dumps(Foo(1,2,3))), len(pickle.dumps(Bar(1,2,3)))
(57, 60)

So, there seems to be a three-byte difference (when we make the classes have the same length name… that accounts for 5 out of the 8 byte difference you were originally seeing)

An important point to understand is that a "pickle" is basically a series of instructions on how to rebuild an object, these instructions are executed on a pickle virtual machine. We can use pickletools.dis to get a human readable disassembly of these instructions. Now, let’s see what the disassembly shows us:

>>> pickletools.dis(pickle.dumps(Foo(1,2,3)))
    0: x80 PROTO      4
    2: x95 FRAME      46
   11: x8c SHORT_BINUNICODE '__main__'
   21: x94 MEMOIZE    (as 0)
   22: x8c SHORT_BINUNICODE 'Foo'
   27: x94 MEMOIZE    (as 1)
   28: x93 STACK_GLOBAL
   29: x94 MEMOIZE    (as 2)
   30: )    EMPTY_TUPLE
   31: x81 NEWOBJ
   32: x94 MEMOIZE    (as 3)
   33: }    EMPTY_DICT
   34: x94 MEMOIZE    (as 4)
   35: (    MARK
   36: x8c     SHORT_BINUNICODE 'a'
   39: x94     MEMOIZE    (as 5)
   40: K        BININT1    1
   42: x8c     SHORT_BINUNICODE 'b'
   45: x94     MEMOIZE    (as 6)
   46: K        BININT1    2
   48: x8c     SHORT_BINUNICODE 'c'
   51: x94     MEMOIZE    (as 7)
   52: K        BININT1    3
   54: u        SETITEMS   (MARK at 35)
   55: b    BUILD
   56: .    STOP
highest protocol among opcodes = 4

And:

>>> pickletools.dis(pickle.dumps(Bar(1,2,3)))
    0: x80 PROTO      4
    2: x95 FRAME      49
   11: x8c SHORT_BINUNICODE '__main__'
   21: x94 MEMOIZE    (as 0)
   22: x8c SHORT_BINUNICODE 'Bar'
   27: x94 MEMOIZE    (as 1)
   28: x93 STACK_GLOBAL
   29: x94 MEMOIZE    (as 2)
   30: )    EMPTY_TUPLE
   31: x81 NEWOBJ
   32: x94 MEMOIZE    (as 3)
   33: N    NONE
   34: }    EMPTY_DICT
   35: x94 MEMOIZE    (as 4)
   36: (    MARK
   37: x8c     SHORT_BINUNICODE 'a'
   40: x94     MEMOIZE    (as 5)
   41: K        BININT1    1
   43: x8c     SHORT_BINUNICODE 'b'
   46: x94     MEMOIZE    (as 6)
   47: K        BININT1    2
   49: x8c     SHORT_BINUNICODE 'c'
   52: x94     MEMOIZE    (as 7)
   53: K        BININT1    3
   55: u        SETITEMS   (MARK at 36)
   56: x86 TUPLE2
   57: x94 MEMOIZE    (as 8)
   58: b    BUILD
   59: .    STOP
highest protocol among opcodes = 4

So, the first difference is that on opcode 33, the non-slotted class is missing a None, i.e.:

33: }    EMPTY_DICT
34: x94 MEMOIZE    (as 4)

Vs:

33: N    NONE
34: }    EMPTY_DICT
35: x94 MEMOIZE    (as 4)

The rest of the instructions build the same dictionary, but then the slotted version also does:

56: x86 TUPLE2
57: x94 MEMOIZE    (as 8)

Which creates a tuple (None, {<the dict>})

I am almost certain this is related to the difference between the results of __getstate__:

>>> Foo(1,2,3).__getstate__()
{'a': 1, 'b': 2, 'c': 3}
>>> Bar(1,2,3).__getstate__()
(None, {'a': 1, 'b': 2, 'c': 3})

That behavior is described in the pickle docs for object.__getstate__:

For a class that has an instance __dict__ and no __slots__, the
default state is self.__dict__.

For a class that has __slots__ and no instance __dict__, the default
state is a tuple whose first item is None and whose second item is a
dictionary mapping slot names to slot values described in the previous
bullet.

Answered By: juanpa.arrivillaga

5 bytes of the difference are just because you added "Slots" to the class name, and that has to be embedded in the pickle to look up the class.

The other 3 bytes are actually because of slots. Normally, the default __getstate__ just returns an object’s __dict__, or None if the dict is empty or nonexistent, and pickle sets entries in the new object’s dict with values from the unpickled state dict when unpickling an object. That doesn’t work for slots, which aren’t stored in an object’s dict.

When an object has slots, the default __getstate__ instead returns a 2-element tuple. The first element is the object’s __dict__, if it has a non-empty __dict__ – objects with slots can still have a __dict__ in some cases. The second is a dict mapping the names of all populated slots to their values. Entries from the first dict will be set directly in the new object’s __dict__, while entries from the second dict will be set with an ordinary attribute-setting operation.

When you pickle Class(1, 2, 3), pickle has to serialize a {'a': 1, 'b': 2, 'c': 3} state, while for ClassSlots(1, 2, 3), pickle has to serialize a (None, {'a': 1, 'b': 2, 'c': 3}) state tuple. This means the pickle contains an extra NONE opcode to load None, an extra TUPLE2 opcode to pack the None and the dict into a tuple, and an extra MEMOIZE opcode to store the tuple in the pickle memo (actually unnecessary, since nothing ever loads the tuple from the memo, but the pickle compiler doesn’t optimize pickles by default).

You can see the disassembly for the pickles with pickletools.dis, and if you want shorter pickles at the expense of extra time spent optimizing them, you can run pickletools.optimize. (Unfortunately, the optimizer is written in pure Python, so optimizing the pickle takes over 10 times as long as creating the original pickle.)

Demo:

import pickle
import pickletools

class Class:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c


class ClassSlots:
    __slots__ = ["a", "b", "c"]

    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

pickletools.dis(pickle.dumps(Class(1, 2, 3)))
print('-------------')
pickletools.dis(pickle.dumps(ClassSlots(1, 2, 3)))
print('-------------')
pickletools.dis(pickletools.optimize(pickle.dumps(ClassSlots(1, 2, 3))))

Output:

    0: x80 PROTO      4
    2: x95 FRAME      48
   11: x8c SHORT_BINUNICODE '__main__'
   21: x94 MEMOIZE    (as 0)
   22: x8c SHORT_BINUNICODE 'Class'
   29: x94 MEMOIZE    (as 1)
   30: x93 STACK_GLOBAL
   31: x94 MEMOIZE    (as 2)
   32: )    EMPTY_TUPLE
   33: x81 NEWOBJ
   34: x94 MEMOIZE    (as 3)
   35: }    EMPTY_DICT
   36: x94 MEMOIZE    (as 4)
   37: (    MARK
   38: x8c     SHORT_BINUNICODE 'a'
   41: x94     MEMOIZE    (as 5)
   42: K        BININT1    1
   44: x8c     SHORT_BINUNICODE 'b'
   47: x94     MEMOIZE    (as 6)
   48: K        BININT1    2
   50: x8c     SHORT_BINUNICODE 'c'
   53: x94     MEMOIZE    (as 7)
   54: K        BININT1    3
   56: u        SETITEMS   (MARK at 37)
   57: b    BUILD
   58: .    STOP
highest protocol among opcodes = 4
-------------
    0: x80 PROTO      4
    2: x95 FRAME      56
   11: x8c SHORT_BINUNICODE '__main__'
   21: x94 MEMOIZE    (as 0)
   22: x8c SHORT_BINUNICODE 'ClassSlots'
   34: x94 MEMOIZE    (as 1)
   35: x93 STACK_GLOBAL
   36: x94 MEMOIZE    (as 2)
   37: )    EMPTY_TUPLE
   38: x81 NEWOBJ
   39: x94 MEMOIZE    (as 3)
   40: N    NONE
   41: }    EMPTY_DICT
   42: x94 MEMOIZE    (as 4)
   43: (    MARK
   44: x8c     SHORT_BINUNICODE 'a'
   47: x94     MEMOIZE    (as 5)
   48: K        BININT1    1
   50: x8c     SHORT_BINUNICODE 'b'
   53: x94     MEMOIZE    (as 6)
   54: K        BININT1    2
   56: x8c     SHORT_BINUNICODE 'c'
   59: x94     MEMOIZE    (as 7)
   60: K        BININT1    3
   62: u        SETITEMS   (MARK at 43)
   63: x86 TUPLE2
   64: x94 MEMOIZE    (as 8)
   65: b    BUILD
   66: .    STOP
highest protocol among opcodes = 4
-------------
    0: x80 PROTO      4
    2: x95 FRAME      47
   11: x8c SHORT_BINUNICODE '__main__'
   21: x8c SHORT_BINUNICODE 'ClassSlots'
   33: x93 STACK_GLOBAL
   34: )    EMPTY_TUPLE
   35: x81 NEWOBJ
   36: N    NONE
   37: }    EMPTY_DICT
   38: (    MARK
   39: x8c     SHORT_BINUNICODE 'a'
   42: K        BININT1    1
   44: x8c     SHORT_BINUNICODE 'b'
   47: K        BININT1    2
   49: x8c     SHORT_BINUNICODE 'c'
   52: K        BININT1    3
   54: u        SETITEMS   (MARK at 38)
   55: x86 TUPLE2
   56: b    BUILD
   57: .    STOP
highest protocol among opcodes = 4
Answered By: user2357112
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.