Python Multiprocessing Locks

Question:

This multiprocessing code works as expected. It creates 4 Python processes, and uses them to print the numbers 0 through 39, with a delay after each print.

import multiprocessing
import time

def job(num):
  print num
  time.sleep(1)

pool = multiprocessing.Pool(4)

lst = range(40)
for i in lst:
  pool.apply_async(job, [i])

pool.close()
pool.join()

However, when I try to use a multiprocessing.Lock to prevent multiple processes from printing to standard out, the program just exits immediately without any output.

import multiprocessing
import time

def job(lock, num):
  lock.acquire()
  print num
  lock.release()
  time.sleep(1)

pool = multiprocessing.Pool(4)
l = multiprocessing.Lock()

lst = range(40)
for i in lst:
  pool.apply_async(job, [l, i])

pool.close()
pool.join()

Why does the introduction of a multiprocessing.Lock make this code not work?

Update: It works when the lock is declared globally (where I did a few non-definitive tests to check that the lock works), as opposed to the code above which passes the lock as an argument (Python’s multiprocessing documentation shows locks being passed as arguments). The code below has a lock declared globally, as opposed to passing as an argument in the code above.

import multiprocessing
import time

l = multiprocessing.Lock()

def job(num):
  l.acquire()
  print num
  l.release()
  time.sleep(1)

pool = multiprocessing.Pool(4)

lst = range(40)
for i in lst:
  pool.apply_async(job, [i])

pool.close()
pool.join()
Asked By: dannyadam

||

Answers:

I think the reason is that the multiprocessing pool uses pickle to transfer objects between the processes. However, a Lock cannot be pickled:

>>> import multiprocessing
>>> import pickle
>>> lock = multiprocessing.Lock()
>>> lp = pickle.dumps(lock)
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    lp = pickle.dumps(lock)
...
RuntimeError: Lock objects should only be shared between processes through inheritance
>>> 

See the “Picklability” and “Better to inherit than pickle/unpickle” sections of https://docs.python.org/2/library/multiprocessing.html#all-platforms

Answered By: Tom Dalton

If you change pool.apply_async to pool.apply, you get this exception:

Traceback (most recent call last):
  File "p.py", line 15, in <module>
    pool.apply(job, [l, i])
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 244, in apply
    return self.apply_async(func, args, kwds).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
RuntimeError: Lock objects should only be shared between processes through inheritance

pool.apply_async is just hiding it. I hate to say this, but using a global variable is probably the simplest way for your example. Let’s just hope the velociraptors don’t get you.

Answered By: matsjoyce

Other answers already provide the answer that the apply_async silently fails unless an appropriate error_callback argument is provided. I still found OP’s other point valid — the official docs do indeed show multiprocessing.Lock being passed around as a function argument. In fact, the sub-section titled “Explicitly pass resources to child processes” in Programming guidelines recommends passing a multiprocessing.Lock object as function argument instead of a global variable. And, I have been writing a lot of code in which I pass a multiprocessing.Lock as an argument to the child process and it all works as expected.

So, what gives?

I first investigated whether multiprocessing.Lock is pickle-able or not. In Python 3, MacOS+CPython, trying to pickle multiprocessing.Lock produces the familiar RuntimeError encountered by others.

>>> pickle.dumps(multiprocessing.Lock())
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-66dfe1355652> in <module>
----> 1 pickle.dumps(multiprocessing.Lock())

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/synchronize.py in __getstate__(self)
     99
    100     def __getstate__(self):
--> 101         context.assert_spawning(self)
    102         sl = self._semlock
    103         if sys.platform == 'win32':

/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py in assert_spawning(obj)
    354         raise RuntimeError(
    355             '%s objects should only be shared between processes'
--> 356             ' through inheritance' % type(obj).__name__
    357             )

RuntimeError: Lock objects should only be shared between processes through inheritance

To me, this confirms that multiprocessing.Lock is indeed not pickle-able.

Aside begins

But, the same lock still needs to be shared across two or more python processes which will have their own, potentially different address spaces (such as when we use “spawn” or “forkserver” as start methods). multiprocessing must be doing something special to send Lock across processes. This other StackOverflow post seems to indicate that in Unix systems, multiprocessing.Lock may be implemented via named semaphores that are supported by the OS itself (outside python). Two or more python processes can then link to the same lock that effectively resides in one location outside both python processes. There may be a shared memory implementation as well.

Aside ends

Can we pass multiprocessing.Lock object as an argument or not?

After a few more experiments and more reading, it appears that the difference is between multiprocessing.Pool and multiprocessing.Process.

multiprocessing.Process lets you pass multiprocessing.Lock as an argument but multiprocessing.Pool doesn’t. Here is an example that works:

import multiprocessing
import time
from multiprocessing import Process, Lock


def task(n: int, lock):
    with lock:
        print(f'n={n}')
    time.sleep(0.25)


if __name__ == '__main__':
    multiprocessing.set_start_method('forkserver')
    lock = Lock()
    processes = [Process(target=task, args=(i, lock)) for i in range(20)]
    for process in processes:
        process.start()
    for process in processes:
        process.join()

Note the use of __name__ == '__main__' is essential as mentioned in the “Safe importing of main module” sub-section of Programming guidelines.

multiprocessing.Pool seems to use queue.SimpleQueue which puts each task in a queue and that’s where pickling happens. Most likely, multiprocessing.Process is not using pickling (or doing a special version of pickling).

Answered By: Ankur

As mentioned in this stackoverflow post, Manager.Lock() might be appropriate here. It can be passed to the Pool, because it can be pickled.

import multiprocessing
import time

def job(lock, num):
  lock.acquire()
  print num
  lock.release()
  time.sleep(1)

pool = multiprocessing.Pool(4)
m = multiprocessing.Manager()
l = m.Lock()

lst = range(40)
for i in lst:
  pool.apply_async(job, [l, i])

pool.close()
pool.join()
Answered By: Karol Skalski
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.