concurrent writing to the same file using threads and processes

Question:

What is the correct solution to be sure that the file will be never corrupted while using many threads and processes?

version for threads, which care about opening errors.

lock = threading.RLock()
with lock:
   try:
     f = open(file, 'a')
     try:
        f.write('sth')
     finally:
        f.close() # try close in any circumstances if open passed
   except:
     pass # when open failed

for processes I guess must use multiprocessing.Lock

but if I want 2 processes, and the first process owns 2 threads (each one uses the file)

I want to know how to mix synchronization with threads and processes.
do threads "inherit" it from the process? so only synchronization between processes is required?

Also. I’m not sure if the above code need nested try in a case when the write operation will fail, and we want to close opened file (what if it will remain open after the lock is released)

Asked By: Sławomir Lenart

||

Answers:

While this isn’t entirely clear from the docs, multiprocessing synchronization primitives do in fact synchronize threads as well.

For example, if you run this code:

import multiprocessing
import sys
import threading
import time

lock = multiprocessing.Lock()

def f(i):
    with lock:
        for _ in range(10):
            sys.stderr.write(i)
            time.sleep(1)

t1 = threading.Thread(target=f, args=['1'])
t2 = threading.Thread(target=f, args=['2'])
t1.start()
t2.start()
t1.join()
t2.join()

… the output will always be 1111111111222222222 or 22222222221111111111, not a mixture of the two.

The locks are implemented on top of Win32 kernel sync objects on Windows, semaphores on POSIX platforms that support them, and not implemented at all on other platforms. (You can test this with import multiprocessing.semaphore, which will raise an ImportError on other platforms, as explained in the docs.)


That being said, it’s certainly safe to have two levels of locks, as long as you always use them in the right order—that is, never grab the threading.Lock unless you can guarantee that your process has the multiprocessing.Lock.

If you do this cleverly enough, it can have performance benefits. (Cross-process locks on Windows, and on some POSIX platforms, can be orders of magnitude slower than intra-process locks.)

If you just do it in the obvious way (only do with threadlock: inside with processlock: blocks), it obviously won’t help performance, and in fact will slow things down a bit (although quite possibly not enough to measure), and it won’t add any direct benefits. Of course your readers will know that your code is correct even if they don’t know that multiprocessing locks work between threads, and in some cases debugging intraprocess deadlocks can be a lot easier than debugging interprocess deadlocks… but I don’t think either of those is a good enough reason for the extra complexity in most cases.

Answered By: abarnert