Why is multiprocess running outside of the target function?

Question:

I have this code:

import multiprocessing
with open('pairs.txt') as f:
    pairs = f.read().splitlines()

print(pairs)


def worker(pairtxt):
    print(pairtxt)

if __name__ == '__main__':
    jobs = []
    for i in pairs:
        p = multiprocessing.Process(target=worker, args=(i,))
        jobs.append(p)
        p.start()

When I run this, it outputs pairs variable(not pairtxt) 3 times, I’m guessing once by itself and twice by the multiprocessing but why is it even running outside the target function?

My output I’m getting with pairs.txt containing 2 lines “1” and “2”

['1', '2']
['1', '2']
1
['1', '2']
2
Asked By: gregz11818

||

Answers:

Try moving the with open and print(pairs) statements into your if __name__ == '__main__' block.

I suspect that python is running the full script every time the subprocess is called, as it wants to ensure that all of the dependencies are met (imports and such) for the function that you hand it. By having running code in your script that’s outside of your main block, it’s running that every time your subprocess is run as well.

import multiprocessing
def worker(pairtxt):
    print(pairtxt)

if __name__ == '__main__':
    with open('pairs.txt') as f:
        pairs = f.read().splitlines()

    print(pairs)
    jobs = []
    for i in pairs:
        p = multiprocessing.Process(target=worker, args=(i,))
        jobs.append(p)
        p.start()
Answered By: AetherUnbound

Because Python is a dynamic language. When importing a module, all toplevel statements in that module are executed. The reason you usually don’t notice is that many modules only declare things, or at least check if they’re the main module before performing actions.

In your example, you have a completely unconditional section loading a file and printing its contents before you even define the worker function. This may or may not execute once per process in multiprocessing; it wouldn’t on unixlikes where fork is available, because they can simply clone the existing process instead of importing the module anew.

This is documented under Safe importing of main module, and is a frequent gotcha of multiprocessing.

Answered By: Yann Vernier

To avoid the issue just add one line of code in "if name == ‘main‘:"

multiprocessing.set_start_method('fork')

Modified code:

if __name__ == '__main__':
    multiprocessing.set_start_method('fork')
    jobs = []
    for i in pairs:
        p = multiprocessing.Process(target=worker, args=(i,))
        jobs.append(p)
        p.start()

Output:

['1', '2']
1
2

What multiprocessing.set_start_method(‘fork’) does?
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
Available on Unix & Mac (not sure about Windows). The default on Unix.

Ref: multiprocessing

Answered By: Swap
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.