share variable that change after import between main process and child processes

Question:

I’ve a project in gRPC where the main.py spawns grpc servers as subprocesses.

Also in the project I’ve settings.py that contains some configurations, like:

some_config = {"foo": "bar"}

In some files (used by different processes) I have:

import settings
...
the value of settings.some_config is read

In the main process I’ve a listener that updates some_config on demand, for example:

settings.some_config = new_value

I noticed that while changing settings.some_config value in the main process, it was not changed in a subprocess that I checked, and remained the old value.

I want that all subprocess would always have the most up-to-date value of settings.some_config.

A solution I thought about – passing a queue or a pipe to each sub process, and when some_config changes in the main process, I can send the new data through the queue/pipe to each subprocess.

But how can I alert it to assign new value to settings.some_config in the subprocess? Should I use a listener in each subprocesses so that when a notification arrives it will do:

settings.some_config = new_value

Would this work? The end goal is to have settings.some_config value the most up to date across all modules/process without restarting the server. I’m also not sure if it would work since it could be that each module keeps the value of settings.some_config which was previously imported in its cached memory.


UPDATE

I took on Charchit’s solution and adjusted it to my requirements, so we have:

from multiprocessing.managers import BaseManager, NamespaceProxy
from multiprocessing import Process
import settings
import time

def get_settings():
    return settings

def run(proxy_settings):
    settings = proxy_settings # So the module settings becomes the proxy object

if __name__ == '__main__':

    BaseManager.register('get_settings', get_settings, proxytype=NamespaceProxy)
    manager = BaseManager()
    manager.start()

    settings = manager.get_settings()
    p = Process(target=run, args=(settings, ))
    p.start()

Few questions:

Should an entire module (settings) be the target of a proxy object? Is it standard to do so?

There is a lot of magic here, for instance, Is the simple answer, to how it works is that now the module settings is a shared proxy object? So when a sub process reads settings.some_config, it would actually read the value from manager?

Are there any side effects I should be aware of?

Should I be using locks when I change any value in settings in the main process?

Asked By: nscode

||

Answers:

The easiest way to do this is to share the module with a manager:

from multiprocessing.managers import BaseManager, NamespaceProxy
from multiprocessing import Process
import settings
import time

def get_settings():
    return settings

def run(settings):
    for _ in range(2):
        print("Inside subprocess, the value is", settings.some_config)
        time.sleep(3)

if __name__ == '__main__':

    BaseManager.register('get_settings', get_settings, proxytype=NamespaceProxy)
    manager = BaseManager()
    manager.start()

    settings = manager.get_settings()
    p = Process(target=run, args=(settings, ))
    p.start()

    time.sleep(1)
    settings.some_config = {'changed': 'value'}
    p.join()

Doing so would mean that you don’t have to handle informing subprocesses that there is a change in the value, they will just simply know because they are receiving the value from the manager process which handles this automatically.

Output

Inside subprocess, the value is {'foo': 'bar'}
Inside subprocess, the value is {'changed': 'value'}

Some things to keep in mind

Firstly, remember that settings.some_config needs to be set explicitly. This means you can do settings.some_config = {} but you cannot do settings.some_config['foo'] = "bar". If you want to modify a single key then get the latest config, update that, and explicitly set it like below:

temp = settings.some_config
temp['foo'] = 'bar'
settings.some_config = temp

Secondly, to keep the possible changes to your codebase to an absolute minimal, you are reassigning the settings variable (initially mapped to the settings.py module object) to the proxy. In the above code, you are doing this inside the __main__ block (so settings is being changed globally). Therefore, any changes made to settings from main process would automatically be reflected in the other processes accessing the proxy. This is also being partially replicated inside the child processes running function run. Accessing settings from inside run would mean the same as accessing the proxy. However, if you are calling some other function inside run, (say run2) which does not take settings as an argument, and it tries to access settings, then it will access the imported module instead of the proxy. Example:

def run2():
    print("Inside subprocess run2, the value is", settings.some_config)

def run(settings):
    for _ in range(2):
        print("Inside subprocess run, the value is", settings.some_config)
        time.sleep(3)
    run2()

Output

Inside subprocess run, the value is {'foo': 'bar'}
Inside subprocess run, the value is {'changed': 'value'}
Inside subprocess run2, the value is {'foo': 'bar'}

If you do not want this, then you simply need to assign the argument as the value of the global variable settings:

def run2():
    print("Inside subprocess run2, the value is", settings.some_config)

def run(shared_settings):
    global settings
    settings = shared_settings
    for _ in range(2):
        print("Inside subprocess run, the value is", settings.some_config)
        time.sleep(3)
    run2()

Any function (inside the subprocess) now accessing settings would access the proxy.

Output

Inside subprocess run, the value is {'foo': 'bar'}
Inside subprocess run, the value is {'changed': 'value'}
Inside subprocess run2, the value is {'changed': 'value'}

Lastly, if you have many subprocesses running then this might become slow (more connections to manager = less speed). If this bothers you then I recommend you to do it the way you stated in the description — i.e, "passing a queue or a pipe to each sub process". To make sure that the child process updates it’s value as soon fast as it can after you pass the value inside the queue, you can spawn a thread inside the subprocess which constantly polls whether a value in the queue exists, and if it does, it updates the process’s settings value to the one provided in the queue. Just make sure to run the thread as a daemon, or explicitly agree on an exit condition.

Update

Should an entire module (settings) be the target of a proxy object? Is it standard to do so?

If your question is whether it is safe to do so then yes it is, just keep in mind the things I have outlined in this answer. At the end of the day, module is just another object, and sharing it here makes more sense.

There is a lot of magic here, for instance, Is the simple answer, to how it works is that now the module settings is a shared proxy object? So when a sub process reads settings.some_config, it would actually read the value from manager?

You need to add a couple of lines in the run function for that to be the case, check the second point in the previous section.

Are there any side effects I should be aware of?

Check previous section.

Should I be using locks when I change any value in settings in the main process?

Not necessary here

Answered By: Charchit Agarwal

Charchit’s solution of creating a specialized managed object is more complicated than it needs to be. If the assumption is that the configuration is being stored as a dictionary, then just use a 'multiprocessing.managers.DictProxy' instance returned by the multiprocessor.Manager().dict method. This also allows you to update individual keys rather than having to do your update by setting a completely new dictionary value:

from multiprocessing import Process, Manager
import time

def get_settings(manager):
    return manager.dict({'foo': 'bar', 'x': 17})

def run(settings):
    for _ in range(2):
        print("Inside subprocess, the value is", settings)
        time.sleep(3)

if __name__ == '__main__':

    manager = Manager()

    settings = get_settings(manager)
    p = Process(target=run, args=(settings, ))
    p.start()

    time.sleep(1)
    settings['foo'] = 'changed bar'
    p.join()

Prints:

Inside subprocess, the value is {'foo': 'bar', 'x': 17}
Inside subprocess, the value is {'foo': 'changed bar', 'x': 17}
Answered By: Booboo