python multiprocessing – Sharing a dictionary of classes between processes with subsequent writes from the process reflected to the shared memory
Question:
Problem
I need to share a dictionary between processes that contains an instance of a class inside of the value component of the key-value pair. The dictionary created using multiprocessing’s dict() from the manager class is able to store values, but subsequent writes to update the values aren’t reflected to the shared memory.
What I’ve tried
To attempt to solve this problem, I know I have to use a dict() created by a manager from python’s multiprocessing library so that it can be shared between processes. This works with simple values likes integers and strings. However, I had hoped that the created dictionary would handle deeper levels of synchronization for me so I could just create a class inside of the dictionary and that change would be reflected, but it seems multiprocessing is much more complicated than that.
Example
Below I have provided an example program that doesn’t work as intended. The printed values aren’t what they were set to be inside of the worker function f().
Note: I am using python3 for this example
from multiprocessing import Manager
import multiprocessing as mp
import random
class ExampleClass:
def __init__(self, stringVar):
# these variables aren't saved across processes?
self.stringVar = stringVar
self.count = 0
class ProcessContainer(object):
processes = []
def __init__(self, *args, **kwargs):
manager = Manager()
self.dict = manager.dict()
def f(self, dict):
# generate a random index to add the class to
index = str(random.randint(0, 100))
# create a new class at that index
dict[index] = ExampleClass(str(random.randint(100, 200)))
# this is the problem, it doesn't share the updated variables in the dictionary between the processes <----------------------
# attempt to change the created variables
dict[index].count += 1
dict[index].stringVar = "yeAH"
# print what's inside
for x in dict.values():
print(x.count, x.stringVar)
def Run(self):
# create the processes
for str in range(3):
p = mp.Process(target=self.f, args=(self.dict,))
self.processes.append(p)
# start the processes
[proc.start() for proc in self.processes]
# wait for the processes to finish
[proc.join() for proc in self.processes]
if __name__ == '__main__':
test = ProcessContainer()
test.Run()
Answers:
This is a "gotcha" that holds a lot of surprises for the uninitiated. The problem is that when you have a managed dictionary, to see updates propagated you need to change a key or a value of a key. Here technically you have not changed the value, that is, you are still referencing the same object instance (type ExampleClass
) and are only changing something within that reference. Bizarre, I know. This is the modified method f
that you need:
def f(self, dict):
# generate a random index to add the class to
index = str(random.randint(0, 100))
# create a new class at that index
dict[index] = ExampleClass(str(random.randint(100, 200)))
# this is the problem, it doesn't share the updated variables in the dictionary between the processes <----------------------
# attempt to change the created variables
ec = dict[index]
ec.count += 1
ec.stringVar = "yeAH"
dict[index] = ec # show new reference
# print what's inside
for x in dict.values():
print(x.count, x.stringVar)
Note:
Had you used the following code to set the key/pair values, the following would actually print False
:
ec = ExampleClass(str(random.randint(100, 200)))
dict[index] = ec
print(dict[index] is ec)
This is why in the modifed method f
, dict[index] = ec # show new reference
appears to be a new reference being set as the value.
Also, you should consider not using dict
, a builtin data type, as a variable name.
Problem
I need to share a dictionary between processes that contains an instance of a class inside of the value component of the key-value pair. The dictionary created using multiprocessing’s dict() from the manager class is able to store values, but subsequent writes to update the values aren’t reflected to the shared memory.
What I’ve tried
To attempt to solve this problem, I know I have to use a dict() created by a manager from python’s multiprocessing library so that it can be shared between processes. This works with simple values likes integers and strings. However, I had hoped that the created dictionary would handle deeper levels of synchronization for me so I could just create a class inside of the dictionary and that change would be reflected, but it seems multiprocessing is much more complicated than that.
Example
Below I have provided an example program that doesn’t work as intended. The printed values aren’t what they were set to be inside of the worker function f().
Note: I am using python3 for this example
from multiprocessing import Manager
import multiprocessing as mp
import random
class ExampleClass:
def __init__(self, stringVar):
# these variables aren't saved across processes?
self.stringVar = stringVar
self.count = 0
class ProcessContainer(object):
processes = []
def __init__(self, *args, **kwargs):
manager = Manager()
self.dict = manager.dict()
def f(self, dict):
# generate a random index to add the class to
index = str(random.randint(0, 100))
# create a new class at that index
dict[index] = ExampleClass(str(random.randint(100, 200)))
# this is the problem, it doesn't share the updated variables in the dictionary between the processes <----------------------
# attempt to change the created variables
dict[index].count += 1
dict[index].stringVar = "yeAH"
# print what's inside
for x in dict.values():
print(x.count, x.stringVar)
def Run(self):
# create the processes
for str in range(3):
p = mp.Process(target=self.f, args=(self.dict,))
self.processes.append(p)
# start the processes
[proc.start() for proc in self.processes]
# wait for the processes to finish
[proc.join() for proc in self.processes]
if __name__ == '__main__':
test = ProcessContainer()
test.Run()
This is a "gotcha" that holds a lot of surprises for the uninitiated. The problem is that when you have a managed dictionary, to see updates propagated you need to change a key or a value of a key. Here technically you have not changed the value, that is, you are still referencing the same object instance (type ExampleClass
) and are only changing something within that reference. Bizarre, I know. This is the modified method f
that you need:
def f(self, dict):
# generate a random index to add the class to
index = str(random.randint(0, 100))
# create a new class at that index
dict[index] = ExampleClass(str(random.randint(100, 200)))
# this is the problem, it doesn't share the updated variables in the dictionary between the processes <----------------------
# attempt to change the created variables
ec = dict[index]
ec.count += 1
ec.stringVar = "yeAH"
dict[index] = ec # show new reference
# print what's inside
for x in dict.values():
print(x.count, x.stringVar)
Note:
Had you used the following code to set the key/pair values, the following would actually print False
:
ec = ExampleClass(str(random.randint(100, 200)))
dict[index] = ec
print(dict[index] is ec)
This is why in the modifed method f
, dict[index] = ec # show new reference
appears to be a new reference being set as the value.
Also, you should consider not using dict
, a builtin data type, as a variable name.