Why is a pickled object with slots bigger than one without slots?
Question:
I’m working on a program that keeps dying because of the OOM killer. I was hoping for some quick wins in reducing the memory usage without a major refactor. I tried adding __slots__
to the most common classes but I noticed the pickled size went up. Why is that?
class Class:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class ClassSlots:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
cases = [
Class(1, 2, 3),
ClassSlots(1, 2, 3),
[Class(1, 2, 3) for _ in range(1000)],
[ClassSlots(1, 2, 3) for _ in range(1000)]
]
for case in cases:
dump = pickle.dumps(case, protocol=5)
print(len(dump))
with Python 3.10 prints
59
67
22041
25046
Answers:
So, on Python 3.11, let’s define the following:
class Foo:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class Bar:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
Now, let’s see:
>>> import pickle
>>> import pickletools
>>> len(pickle.dumps(Foo(1,2,3))), len(pickle.dumps(Bar(1,2,3)))
(57, 60)
So, there seems to be a three-byte difference (when we make the classes have the same length name… that accounts for 5 out of the 8 byte difference you were originally seeing)
An important point to understand is that a "pickle" is basically a series of instructions on how to rebuild an object, these instructions are executed on a pickle virtual machine. We can use pickletools.dis
to get a human readable disassembly of these instructions. Now, let’s see what the disassembly shows us:
>>> pickletools.dis(pickle.dumps(Foo(1,2,3)))
0: x80 PROTO 4
2: x95 FRAME 46
11: x8c SHORT_BINUNICODE '__main__'
21: x94 MEMOIZE (as 0)
22: x8c SHORT_BINUNICODE 'Foo'
27: x94 MEMOIZE (as 1)
28: x93 STACK_GLOBAL
29: x94 MEMOIZE (as 2)
30: ) EMPTY_TUPLE
31: x81 NEWOBJ
32: x94 MEMOIZE (as 3)
33: } EMPTY_DICT
34: x94 MEMOIZE (as 4)
35: ( MARK
36: x8c SHORT_BINUNICODE 'a'
39: x94 MEMOIZE (as 5)
40: K BININT1 1
42: x8c SHORT_BINUNICODE 'b'
45: x94 MEMOIZE (as 6)
46: K BININT1 2
48: x8c SHORT_BINUNICODE 'c'
51: x94 MEMOIZE (as 7)
52: K BININT1 3
54: u SETITEMS (MARK at 35)
55: b BUILD
56: . STOP
highest protocol among opcodes = 4
And:
>>> pickletools.dis(pickle.dumps(Bar(1,2,3)))
0: x80 PROTO 4
2: x95 FRAME 49
11: x8c SHORT_BINUNICODE '__main__'
21: x94 MEMOIZE (as 0)
22: x8c SHORT_BINUNICODE 'Bar'
27: x94 MEMOIZE (as 1)
28: x93 STACK_GLOBAL
29: x94 MEMOIZE (as 2)
30: ) EMPTY_TUPLE
31: x81 NEWOBJ
32: x94 MEMOIZE (as 3)
33: N NONE
34: } EMPTY_DICT
35: x94 MEMOIZE (as 4)
36: ( MARK
37: x8c SHORT_BINUNICODE 'a'
40: x94 MEMOIZE (as 5)
41: K BININT1 1
43: x8c SHORT_BINUNICODE 'b'
46: x94 MEMOIZE (as 6)
47: K BININT1 2
49: x8c SHORT_BINUNICODE 'c'
52: x94 MEMOIZE (as 7)
53: K BININT1 3
55: u SETITEMS (MARK at 36)
56: x86 TUPLE2
57: x94 MEMOIZE (as 8)
58: b BUILD
59: . STOP
highest protocol among opcodes = 4
So, the first difference is that on opcode 33, the non-slotted class is missing a None
, i.e.:
33: } EMPTY_DICT
34: x94 MEMOIZE (as 4)
Vs:
33: N NONE
34: } EMPTY_DICT
35: x94 MEMOIZE (as 4)
The rest of the instructions build the same dictionary, but then the slotted version also does:
56: x86 TUPLE2
57: x94 MEMOIZE (as 8)
Which creates a tuple (None, {<the dict>})
I am almost certain this is related to the difference between the results of __getstate__
:
>>> Foo(1,2,3).__getstate__()
{'a': 1, 'b': 2, 'c': 3}
>>> Bar(1,2,3).__getstate__()
(None, {'a': 1, 'b': 2, 'c': 3})
That behavior is described in the pickle docs for object.__getstate__
:
For a class that has an instance __dict__
and no __slots__
, the
default state is self.__dict__
.
…
For a class that has __slots__
and no instance __dict__
, the default
state is a tuple whose first item is None
and whose second item is a
dictionary mapping slot names to slot values described in the previous
bullet.
5 bytes of the difference are just because you added "Slots" to the class name, and that has to be embedded in the pickle to look up the class.
The other 3 bytes are actually because of slots. Normally, the default __getstate__
just returns an object’s __dict__
, or None
if the dict is empty or nonexistent, and pickle
sets entries in the new object’s dict with values from the unpickled state dict when unpickling an object. That doesn’t work for slots, which aren’t stored in an object’s dict.
When an object has slots, the default __getstate__
instead returns a 2-element tuple. The first element is the object’s __dict__
, if it has a non-empty __dict__
– objects with slots can still have a __dict__
in some cases. The second is a dict mapping the names of all populated slots to their values. Entries from the first dict will be set directly in the new object’s __dict__
, while entries from the second dict will be set with an ordinary attribute-setting operation.
When you pickle Class(1, 2, 3)
, pickle
has to serialize a {'a': 1, 'b': 2, 'c': 3}
state, while for ClassSlots(1, 2, 3)
, pickle
has to serialize a (None, {'a': 1, 'b': 2, 'c': 3})
state tuple. This means the pickle contains an extra NONE
opcode to load None
, an extra TUPLE2
opcode to pack the None
and the dict into a tuple, and an extra MEMOIZE
opcode to store the tuple in the pickle memo (actually unnecessary, since nothing ever loads the tuple from the memo, but the pickle compiler doesn’t optimize pickles by default).
You can see the disassembly for the pickles with pickletools.dis
, and if you want shorter pickles at the expense of extra time spent optimizing them, you can run pickletools.optimize
. (Unfortunately, the optimizer is written in pure Python, so optimizing the pickle takes over 10 times as long as creating the original pickle.)
Demo:
import pickle
import pickletools
class Class:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class ClassSlots:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
pickletools.dis(pickle.dumps(Class(1, 2, 3)))
print('-------------')
pickletools.dis(pickle.dumps(ClassSlots(1, 2, 3)))
print('-------------')
pickletools.dis(pickletools.optimize(pickle.dumps(ClassSlots(1, 2, 3))))
Output:
0: x80 PROTO 4
2: x95 FRAME 48
11: x8c SHORT_BINUNICODE '__main__'
21: x94 MEMOIZE (as 0)
22: x8c SHORT_BINUNICODE 'Class'
29: x94 MEMOIZE (as 1)
30: x93 STACK_GLOBAL
31: x94 MEMOIZE (as 2)
32: ) EMPTY_TUPLE
33: x81 NEWOBJ
34: x94 MEMOIZE (as 3)
35: } EMPTY_DICT
36: x94 MEMOIZE (as 4)
37: ( MARK
38: x8c SHORT_BINUNICODE 'a'
41: x94 MEMOIZE (as 5)
42: K BININT1 1
44: x8c SHORT_BINUNICODE 'b'
47: x94 MEMOIZE (as 6)
48: K BININT1 2
50: x8c SHORT_BINUNICODE 'c'
53: x94 MEMOIZE (as 7)
54: K BININT1 3
56: u SETITEMS (MARK at 37)
57: b BUILD
58: . STOP
highest protocol among opcodes = 4
-------------
0: x80 PROTO 4
2: x95 FRAME 56
11: x8c SHORT_BINUNICODE '__main__'
21: x94 MEMOIZE (as 0)
22: x8c SHORT_BINUNICODE 'ClassSlots'
34: x94 MEMOIZE (as 1)
35: x93 STACK_GLOBAL
36: x94 MEMOIZE (as 2)
37: ) EMPTY_TUPLE
38: x81 NEWOBJ
39: x94 MEMOIZE (as 3)
40: N NONE
41: } EMPTY_DICT
42: x94 MEMOIZE (as 4)
43: ( MARK
44: x8c SHORT_BINUNICODE 'a'
47: x94 MEMOIZE (as 5)
48: K BININT1 1
50: x8c SHORT_BINUNICODE 'b'
53: x94 MEMOIZE (as 6)
54: K BININT1 2
56: x8c SHORT_BINUNICODE 'c'
59: x94 MEMOIZE (as 7)
60: K BININT1 3
62: u SETITEMS (MARK at 43)
63: x86 TUPLE2
64: x94 MEMOIZE (as 8)
65: b BUILD
66: . STOP
highest protocol among opcodes = 4
-------------
0: x80 PROTO 4
2: x95 FRAME 47
11: x8c SHORT_BINUNICODE '__main__'
21: x8c SHORT_BINUNICODE 'ClassSlots'
33: x93 STACK_GLOBAL
34: ) EMPTY_TUPLE
35: x81 NEWOBJ
36: N NONE
37: } EMPTY_DICT
38: ( MARK
39: x8c SHORT_BINUNICODE 'a'
42: K BININT1 1
44: x8c SHORT_BINUNICODE 'b'
47: K BININT1 2
49: x8c SHORT_BINUNICODE 'c'
52: K BININT1 3
54: u SETITEMS (MARK at 38)
55: x86 TUPLE2
56: b BUILD
57: . STOP
highest protocol among opcodes = 4
I’m working on a program that keeps dying because of the OOM killer. I was hoping for some quick wins in reducing the memory usage without a major refactor. I tried adding __slots__
to the most common classes but I noticed the pickled size went up. Why is that?
class Class:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class ClassSlots:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
cases = [
Class(1, 2, 3),
ClassSlots(1, 2, 3),
[Class(1, 2, 3) for _ in range(1000)],
[ClassSlots(1, 2, 3) for _ in range(1000)]
]
for case in cases:
dump = pickle.dumps(case, protocol=5)
print(len(dump))
with Python 3.10 prints
59
67
22041
25046
So, on Python 3.11, let’s define the following:
class Foo:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class Bar:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
Now, let’s see:
>>> import pickle
>>> import pickletools
>>> len(pickle.dumps(Foo(1,2,3))), len(pickle.dumps(Bar(1,2,3)))
(57, 60)
So, there seems to be a three-byte difference (when we make the classes have the same length name… that accounts for 5 out of the 8 byte difference you were originally seeing)
An important point to understand is that a "pickle" is basically a series of instructions on how to rebuild an object, these instructions are executed on a pickle virtual machine. We can use pickletools.dis
to get a human readable disassembly of these instructions. Now, let’s see what the disassembly shows us:
>>> pickletools.dis(pickle.dumps(Foo(1,2,3)))
0: x80 PROTO 4
2: x95 FRAME 46
11: x8c SHORT_BINUNICODE '__main__'
21: x94 MEMOIZE (as 0)
22: x8c SHORT_BINUNICODE 'Foo'
27: x94 MEMOIZE (as 1)
28: x93 STACK_GLOBAL
29: x94 MEMOIZE (as 2)
30: ) EMPTY_TUPLE
31: x81 NEWOBJ
32: x94 MEMOIZE (as 3)
33: } EMPTY_DICT
34: x94 MEMOIZE (as 4)
35: ( MARK
36: x8c SHORT_BINUNICODE 'a'
39: x94 MEMOIZE (as 5)
40: K BININT1 1
42: x8c SHORT_BINUNICODE 'b'
45: x94 MEMOIZE (as 6)
46: K BININT1 2
48: x8c SHORT_BINUNICODE 'c'
51: x94 MEMOIZE (as 7)
52: K BININT1 3
54: u SETITEMS (MARK at 35)
55: b BUILD
56: . STOP
highest protocol among opcodes = 4
And:
>>> pickletools.dis(pickle.dumps(Bar(1,2,3)))
0: x80 PROTO 4
2: x95 FRAME 49
11: x8c SHORT_BINUNICODE '__main__'
21: x94 MEMOIZE (as 0)
22: x8c SHORT_BINUNICODE 'Bar'
27: x94 MEMOIZE (as 1)
28: x93 STACK_GLOBAL
29: x94 MEMOIZE (as 2)
30: ) EMPTY_TUPLE
31: x81 NEWOBJ
32: x94 MEMOIZE (as 3)
33: N NONE
34: } EMPTY_DICT
35: x94 MEMOIZE (as 4)
36: ( MARK
37: x8c SHORT_BINUNICODE 'a'
40: x94 MEMOIZE (as 5)
41: K BININT1 1
43: x8c SHORT_BINUNICODE 'b'
46: x94 MEMOIZE (as 6)
47: K BININT1 2
49: x8c SHORT_BINUNICODE 'c'
52: x94 MEMOIZE (as 7)
53: K BININT1 3
55: u SETITEMS (MARK at 36)
56: x86 TUPLE2
57: x94 MEMOIZE (as 8)
58: b BUILD
59: . STOP
highest protocol among opcodes = 4
So, the first difference is that on opcode 33, the non-slotted class is missing a None
, i.e.:
33: } EMPTY_DICT
34: x94 MEMOIZE (as 4)
Vs:
33: N NONE
34: } EMPTY_DICT
35: x94 MEMOIZE (as 4)
The rest of the instructions build the same dictionary, but then the slotted version also does:
56: x86 TUPLE2
57: x94 MEMOIZE (as 8)
Which creates a tuple (None, {<the dict>})
I am almost certain this is related to the difference between the results of __getstate__
:
>>> Foo(1,2,3).__getstate__()
{'a': 1, 'b': 2, 'c': 3}
>>> Bar(1,2,3).__getstate__()
(None, {'a': 1, 'b': 2, 'c': 3})
That behavior is described in the pickle docs for object.__getstate__
:
For a class that has an instance
__dict__
and no__slots__
, the
default state isself.__dict__
.
…
For a class that has
__slots__
and no instance__dict__
, the default
state is a tuple whose first item isNone
and whose second item is a
dictionary mapping slot names to slot values described in the previous
bullet.
5 bytes of the difference are just because you added "Slots" to the class name, and that has to be embedded in the pickle to look up the class.
The other 3 bytes are actually because of slots. Normally, the default __getstate__
just returns an object’s __dict__
, or None
if the dict is empty or nonexistent, and pickle
sets entries in the new object’s dict with values from the unpickled state dict when unpickling an object. That doesn’t work for slots, which aren’t stored in an object’s dict.
When an object has slots, the default __getstate__
instead returns a 2-element tuple. The first element is the object’s __dict__
, if it has a non-empty __dict__
– objects with slots can still have a __dict__
in some cases. The second is a dict mapping the names of all populated slots to their values. Entries from the first dict will be set directly in the new object’s __dict__
, while entries from the second dict will be set with an ordinary attribute-setting operation.
When you pickle Class(1, 2, 3)
, pickle
has to serialize a {'a': 1, 'b': 2, 'c': 3}
state, while for ClassSlots(1, 2, 3)
, pickle
has to serialize a (None, {'a': 1, 'b': 2, 'c': 3})
state tuple. This means the pickle contains an extra NONE
opcode to load None
, an extra TUPLE2
opcode to pack the None
and the dict into a tuple, and an extra MEMOIZE
opcode to store the tuple in the pickle memo (actually unnecessary, since nothing ever loads the tuple from the memo, but the pickle compiler doesn’t optimize pickles by default).
You can see the disassembly for the pickles with pickletools.dis
, and if you want shorter pickles at the expense of extra time spent optimizing them, you can run pickletools.optimize
. (Unfortunately, the optimizer is written in pure Python, so optimizing the pickle takes over 10 times as long as creating the original pickle.)
Demo:
import pickle
import pickletools
class Class:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
class ClassSlots:
__slots__ = ["a", "b", "c"]
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
pickletools.dis(pickle.dumps(Class(1, 2, 3)))
print('-------------')
pickletools.dis(pickle.dumps(ClassSlots(1, 2, 3)))
print('-------------')
pickletools.dis(pickletools.optimize(pickle.dumps(ClassSlots(1, 2, 3))))
Output:
0: x80 PROTO 4
2: x95 FRAME 48
11: x8c SHORT_BINUNICODE '__main__'
21: x94 MEMOIZE (as 0)
22: x8c SHORT_BINUNICODE 'Class'
29: x94 MEMOIZE (as 1)
30: x93 STACK_GLOBAL
31: x94 MEMOIZE (as 2)
32: ) EMPTY_TUPLE
33: x81 NEWOBJ
34: x94 MEMOIZE (as 3)
35: } EMPTY_DICT
36: x94 MEMOIZE (as 4)
37: ( MARK
38: x8c SHORT_BINUNICODE 'a'
41: x94 MEMOIZE (as 5)
42: K BININT1 1
44: x8c SHORT_BINUNICODE 'b'
47: x94 MEMOIZE (as 6)
48: K BININT1 2
50: x8c SHORT_BINUNICODE 'c'
53: x94 MEMOIZE (as 7)
54: K BININT1 3
56: u SETITEMS (MARK at 37)
57: b BUILD
58: . STOP
highest protocol among opcodes = 4
-------------
0: x80 PROTO 4
2: x95 FRAME 56
11: x8c SHORT_BINUNICODE '__main__'
21: x94 MEMOIZE (as 0)
22: x8c SHORT_BINUNICODE 'ClassSlots'
34: x94 MEMOIZE (as 1)
35: x93 STACK_GLOBAL
36: x94 MEMOIZE (as 2)
37: ) EMPTY_TUPLE
38: x81 NEWOBJ
39: x94 MEMOIZE (as 3)
40: N NONE
41: } EMPTY_DICT
42: x94 MEMOIZE (as 4)
43: ( MARK
44: x8c SHORT_BINUNICODE 'a'
47: x94 MEMOIZE (as 5)
48: K BININT1 1
50: x8c SHORT_BINUNICODE 'b'
53: x94 MEMOIZE (as 6)
54: K BININT1 2
56: x8c SHORT_BINUNICODE 'c'
59: x94 MEMOIZE (as 7)
60: K BININT1 3
62: u SETITEMS (MARK at 43)
63: x86 TUPLE2
64: x94 MEMOIZE (as 8)
65: b BUILD
66: . STOP
highest protocol among opcodes = 4
-------------
0: x80 PROTO 4
2: x95 FRAME 47
11: x8c SHORT_BINUNICODE '__main__'
21: x8c SHORT_BINUNICODE 'ClassSlots'
33: x93 STACK_GLOBAL
34: ) EMPTY_TUPLE
35: x81 NEWOBJ
36: N NONE
37: } EMPTY_DICT
38: ( MARK
39: x8c SHORT_BINUNICODE 'a'
42: K BININT1 1
44: x8c SHORT_BINUNICODE 'b'
47: K BININT1 2
49: x8c SHORT_BINUNICODE 'c'
52: K BININT1 3
54: u SETITEMS (MARK at 38)
55: x86 TUPLE2
56: b BUILD
57: . STOP
highest protocol among opcodes = 4