When using multiprocessing and spawn in python, use self.a in __getattr__ cause infinite loop
Question:
The following code will recurrent the bug:
from multiprocessing import Process, set_start_method
class TestObject:
def __init__(self) -> None:
self.a = lambda *args: {}
def __getattr__(self, item):
return self.a
class TestProcess(Process):
def __init__(self, textobject, **kwargs):
super(TestProcess, self).__init__(**kwargs)
self.testobject = textobject
def run(self) -> None:
print("heihei")
print(self.testobject)
if __name__ == "__main__":
set_start_method("spawn")
testobject = TestObject()
testprocess = TestProcess(testobject)
testprocess.start()
Using ‘spawn’ will cause infinite loop in the method if ‘TestObject.__getattr__’.
When delete the line ‘set_start_method(‘spawn’)’, all things go right.
It would be very thankful of us to know why the infinite loop happen.
Answers:
If you head over to pickle’s documentation, you will find a note that says
At unpickling time, some methods like getattr(), getattribute(), or setattr() may be called upon the instance. In case those methods rely on some internal invariant being true, the type should implement new() to establish such an invariant, as init() is not called when unpickling an instance.
I am unsure of what exact conditions leads to a __getattribute__
call, but you can bypass the default behaviour by providing a __setstate__
method:
class TestObject:
def __init__(self) -> None:
self.a = lambda *args: {}
def __getattr__(self, item):
return self.a
def __setstate__(self, state):
self.__dict__ = state
If it’s present, pickle calls this method with the unpickled state and you are free to restore it however you wish.
Now we figure out what is really happening of the bug:
Before we look into the code, we should know two things:
-
When we define a __getattr__
method for our class, we should never try to get an attribute that does not belong to the class or the instance itself in __getattr__
, otherwise it will cause infinite loop, for example:
class TestObject:
def __getattr__(self, item):
return self.a
if __name__ == "__main__":
testobject = TestObject()
print(f"print a: {testobject.a}")
The result should be like this:
Traceback (most recent call last):
File "tmp_test.py", line 10, in <module>
print(f"print a: {testobject.a}")
File "tmp_test.py", line 6, in __getattr__
return self.a
File "tmp_test.py", line 6, in __getattr__
return self.a
File "tmp_test.py", line 6, in __getattr__
return self.a
[Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded
Cause a
is not in the instance’s __dict__
, so every time it can not find a
, it will go into the __getattr__
method and then cause the infinite loop.
-
The next thing we should remember is how the pickle
module in python
works. When pickling and unpickling one class’s instance, its dumps
and loads
(same for dump
and load
) function will call the instance’s __getstate__
(for dumps
) and __setstate__
(for loads
) methods. And guess when our class does not define these two methods, where python will look for? Yes, the __getattr__
method! Normally, it is ok when pickling the instance, cause for this time, the attributes used in __getattr__
still exist in the instance. But when unpickling, things go wrong.
This is how the pickle
module documentation says when pickling the class’s instance: https://docs.python.org/3/library/pickle.html#pickling-class-instances.
And here is what we should notice:
It means when unpickling one class’s instance, it will not call the __init__
function to create the instance! So when unpickling, pickle’s loads
function would check whether the re-instantiate instance has the __setstate__
method, and as we said above, it will go into the __getattr__
method, but for now, the attributes that the instance once owned has not been given (at the code obj.__dict__.update(attributes)
), so bingo, the infinite loop bug appears!
To reproduce the whole exact bug, you can run this code:
import pickle
class TestClass:
def __init__(self):
self.w = 1
class Test:
def __init__(self):
self.a = TestClass()
def __getattr__(self, item):
print(f"{item} begin.")
print(self.a)
print(f"{item} end.")
try:
return self.a.__getattribute__(item)
except AttributeError as e:
raise e
# def __getstate__(self):
# return self.__dict__
#
# def __setstate__(self, state):
# self.__dict__ = state
if __name__ == "__main__":
test = Test()
print(test.w)
test_data = pickle.dumps(test)
new_test = pickle.loads(test_data)
print(new_test.w)
You should get the infinite bug when not add the __getstate__
and __setstate__
method, and add them will fix it. You can also try to see the print info
to see whether the bug exists at __getattr__('__setstate__')
.
And the connection between this pickle
bug and our multiprocessing
bug at beginning is that it seems when using `spawn“, the son process’s context would try to pickle the father process’s context and then unpickle it and inherit it. So now all things make sense.
The following code will recurrent the bug:
from multiprocessing import Process, set_start_method
class TestObject:
def __init__(self) -> None:
self.a = lambda *args: {}
def __getattr__(self, item):
return self.a
class TestProcess(Process):
def __init__(self, textobject, **kwargs):
super(TestProcess, self).__init__(**kwargs)
self.testobject = textobject
def run(self) -> None:
print("heihei")
print(self.testobject)
if __name__ == "__main__":
set_start_method("spawn")
testobject = TestObject()
testprocess = TestProcess(testobject)
testprocess.start()
Using ‘spawn’ will cause infinite loop in the method if ‘TestObject.__getattr__’.
When delete the line ‘set_start_method(‘spawn’)’, all things go right.
It would be very thankful of us to know why the infinite loop happen.
If you head over to pickle’s documentation, you will find a note that says
At unpickling time, some methods like getattr(), getattribute(), or setattr() may be called upon the instance. In case those methods rely on some internal invariant being true, the type should implement new() to establish such an invariant, as init() is not called when unpickling an instance.
I am unsure of what exact conditions leads to a __getattribute__
call, but you can bypass the default behaviour by providing a __setstate__
method:
class TestObject:
def __init__(self) -> None:
self.a = lambda *args: {}
def __getattr__(self, item):
return self.a
def __setstate__(self, state):
self.__dict__ = state
If it’s present, pickle calls this method with the unpickled state and you are free to restore it however you wish.
Now we figure out what is really happening of the bug:
Before we look into the code, we should know two things:
-
When we define a
__getattr__
method for our class, we should never try to get an attribute that does not belong to the class or the instance itself in__getattr__
, otherwise it will cause infinite loop, for example:class TestObject: def __getattr__(self, item): return self.a if __name__ == "__main__": testobject = TestObject() print(f"print a: {testobject.a}")
The result should be like this:
Traceback (most recent call last): File "tmp_test.py", line 10, in <module> print(f"print a: {testobject.a}") File "tmp_test.py", line 6, in __getattr__ return self.a File "tmp_test.py", line 6, in __getattr__ return self.a File "tmp_test.py", line 6, in __getattr__ return self.a [Previous line repeated 996 more times] RecursionError: maximum recursion depth exceeded
Cause
a
is not in the instance’s__dict__
, so every time it can not finda
, it will go into the__getattr__
method and then cause the infinite loop. -
The next thing we should remember is how the
pickle
module in python
works. When pickling and unpickling one class’s instance, itsdumps
andloads
(same fordump
andload
) function will call the instance’s__getstate__
(fordumps
) and__setstate__
(forloads
) methods. And guess when our class does not define these two methods, where python will look for? Yes, the__getattr__
method! Normally, it is ok when pickling the instance, cause for this time, the attributes used in__getattr__
still exist in the instance. But when unpickling, things go wrong.
This is how the pickle
module documentation says when pickling the class’s instance: https://docs.python.org/3/library/pickle.html#pickling-class-instances.
And here is what we should notice:
It means when unpickling one class’s instance, it will not call the __init__
function to create the instance! So when unpickling, pickle’s loads
function would check whether the re-instantiate instance has the __setstate__
method, and as we said above, it will go into the __getattr__
method, but for now, the attributes that the instance once owned has not been given (at the code obj.__dict__.update(attributes)
), so bingo, the infinite loop bug appears!
To reproduce the whole exact bug, you can run this code:
import pickle
class TestClass:
def __init__(self):
self.w = 1
class Test:
def __init__(self):
self.a = TestClass()
def __getattr__(self, item):
print(f"{item} begin.")
print(self.a)
print(f"{item} end.")
try:
return self.a.__getattribute__(item)
except AttributeError as e:
raise e
# def __getstate__(self):
# return self.__dict__
#
# def __setstate__(self, state):
# self.__dict__ = state
if __name__ == "__main__":
test = Test()
print(test.w)
test_data = pickle.dumps(test)
new_test = pickle.loads(test_data)
print(new_test.w)
You should get the infinite bug when not add the __getstate__
and __setstate__
method, and add them will fix it. You can also try to see the print info
to see whether the bug exists at __getattr__('__setstate__')
.
And the connection between this pickle
bug and our multiprocessing
bug at beginning is that it seems when using `spawn“, the son process’s context would try to pickle the father process’s context and then unpickle it and inherit it. So now all things make sense.