How do python pipe still work through spawning processes?

Question:

I’m trying to understand why the following code works:

import multiprocessing

def send_message(conn):
    # Send a message through the pipe
    conn.send("Hello, world!")


if __name__ == '__main__':
    multiprocessing.set_start_method('spawn')
    # Create a pipe
    parent_conn, child_conn = multiprocessing.Pipe()

    # Create a child process
    p = multiprocessing.Process(target=send_message, args=(child_conn,))
    p.start()

    # Wait for the child process to finish
    p.join()

    # Read the message from the pipe
    message = parent_conn.recv()
    print(message)

As I understand, python pipes are just regular OS pipes, which are file descriptors.
When a new process is created via spawn , we should lose all the file descriptors (contrary to regular fork)

In that case, how is it possible that the python pipe is still "connected" to its parent process?

Asked By: lezebulon

||

Answers:

The documentation does not suggest that it will lose all the file descriptors – only that "unnecessary file descriptors and handles from the parent process will not be inherited". To figure out how this is achieved exactly in CPython, first we need to see what exactly happens when p.start() is called in the example code.

After some point upon starting the process, the Process instance’s underlying Popen helper will be used, in the case for 'spawn' it would be the version provided by popen_spawn_posix. As part of the startup sequence goes, it will get the relevant data that is required to start the process, this includes which function to call and then their arguments (code), which a specific pickler is used.

Given that the Connection object (which Pipe is built upon of) has defined a hook that actually ensures the relevant file descriptor is marked for duplication. This is ultimately invoked from here which points back to the helper function at the 'spawn' version of the Popen.duplicate_for_child, ensuring that any connection objects passed (in your case, args=(child_conn,)) will have their file descriptors passed through to the actual start function spawnv_passfds such that the child process will have access to them.

I will note that I have glossed over various other details, but if you wish to you can always attach a debugger and trace through the startup sequence, which is what I did to derive this answer.

Answered By: metatoaster
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.