Python multiprocessing – problem manipulating data in a multiprocessing Array shared between parent and spawned class

Question

I want to implement a way to share a table of information between a parent function and the instances of classes it will be spawning. According to what I read, I need to use a table of ctypes.c_char_p of a given length.

I have managed to initialize that table from the parent function, that then I pass to the called class. from the __init__() of the class I can access its contents. Finally, I try to manipulate them (reverse the name in this example), I confirm – from the class – that the shared array is populated as expected, but when I try to view the contents from the parent process, I get garbage.

My code is below:

#!/usr/bin/env python3

import multiprocessing
import ctypes
import random
import json

class employee(multiprocessing.Process):

    def __init__(self, employee_data, shared_array_fields):

        self.employee_data = employee_data
        self.shared_array_fields = shared_array_fields
        # self.lock = lock
        
        print("**" * 100)
        print("IN class:n", employee_data[:])

        self.run()

    def run(self):
        for ii in range(self.shared_array_fields):
            employee_string = self.employee_data[ii].decode("utf-8")
            new_name_json = json.loads(employee_string)
            new_name = new_name_json["name"][::-1]
            self.employee_data[ii] = bytes('{ "name": "' + str(new_name) + '" }', "utf-8")

        print("**" * 100)
        print("IN class AFTER manipulation:n", self.employee_data[:])


shared_array_fields = 5

def main():
    global shared_array_fields

    lock = multiprocessing.Lock()
    employee_data = multiprocessing.Array(ctypes.c_char_p, shared_array_fields)

    for ii in range(shared_array_fields):
        name = ''.join(random.choice(['a', 'b', 'c', 'd', 'e']) for i in range(10)) + "_" + str(ii)
        employee_data[ii] = bytes('{ "name": "' + str(name) + '" }', "utf-8")

    print("**" * 100)
    print("BEFORE class:n", employee_data[:])

    proc1 = multiprocessing.Process(target=employee, args=(employee_data, shared_array_fields))
    proc1.start()
    proc1.join()

    # time.sleep(1)

    print("**" * 100)
    print("AFTER class:n", employee_data[:])


if __name__ == "__main__":
    main()

Result:

[http_offline@greenhat-32 tmp]$ ./temp.py 
********************************************************************************************************************************************************************************************************
BEFORE class:
 [b'{ "name": "abbbabeadc_0" }', b'{ "name": "daeebeeabc_1" }', b'{ "name": "dbbceedece_2" }', b'{ "name": "caccdcbeae_3" }', b'{ "name": "ccdcbdabdb_4" }']
********************************************************************************************************************************************************************************************************
IN class:
 [b'{ "name": "abbbabeadc_0" }', b'{ "name": "daeebeeabc_1" }', b'{ "name": "dbbceedece_2" }', b'{ "name": "caccdcbeae_3" }', b'{ "name": "ccdcbdabdb_4" }']
********************************************************************************************************************************************************************************************************
IN class AFTER manipulation:
 [b'{ "name": "0_cdaebabbba" }', b'{ "name": "1_cbaeebeead" }', b'{ "name": "2_ecedeecbbd" }', b'{ "name": "3_eaebcdccac" }', b'{ "name": "4_bdbadbcdcc" }']
********************************************************************************************************************************************************************************************************
AFTER class:
 [b'', b'', b'', b'', b'{ "name": "daeebeeabc_1" }']
[http_offline@greenhat-32 tmp]$

Asked By: ilias-sp

||

Source

Answer 1

Using an Array is very cumbersome since it is very difficult to rewrite a whole new value to it. Also why go through the necessity of converting from a dictionary to a string and back just to use an Array? Finally, any structure you wish to share between processes should be created by an a "manager" instance created by a call to multiprocessing.Manager() unless you want to manage the synchronization yourself.

The easiest way to accomplish your goal is to have the manager create two Queue objects, an input queue (for the input to your process) and an output queue to hold the result. In this particular case you could use the same queue object for both, but this is cleaner and is generally what you would use when you had multiple inputs and outputs being processed simultaneously by a pool of processes and you weren’t using a standard library module such as multiprocessing.pool or concurrent.futures.

Finally, your construction of your Process subclass requires a bit of tweaking (your constructor need to call the base class constructor and should not call run). It’s also usual to name your classes with a capital letter, although I left the name unchanged. I also think it’s more usual not to subclass Process. Generally one just writes a function and passes to the call to Process target, args and/or kwargs parameters.

Update to Use a Managed Dictionary

#!/usr/bin/env python3

import multiprocessing
import random

class employee(multiprocessing.Process):

    def __init__(self, employees):

        super().__init__()   # init the base class !!!
        self.employees = employees

        print("**" * 100)
        print("IN class:n", self.employees[:])


    def run(self):
        employees = self.employees
        for i, employee in enumerate(employees):
            new_name = employee["name"][::-1]
            employee["name"] = new_name
            employees[i] = employee # must be rewritten to show it has changed. Yuck!
        #self.employees = employees

        print("**" * 100)
        print("IN class AFTER manipulation:n", employees[:])


def main():

    manager = multiprocessing.Manager()
    employees = manager.list()

    for ii in range(5):
        name = ''.join(random.choice(['a', 'b', 'c', 'd', 'e']) for i in range(10)) + "_" + str(ii)
        employees.append({"name": name})

    print("**" * 100)
    print("BEFORE class:n", employees[:])

    proc1 = employee(employees)
    proc1.start()
    proc1.join()

    # time.sleep(1)

    print("**" * 100)
    print("AFTER class:n", employees[:])


if __name__ == "__main__":
    main()

Answered By: Booboo

Python multiprocessing – problem manipulating data in a multiprocessing Array shared between parent and spawned class

Question:

Answers: