Spawning a new process with an asyncio loop from within the asyncio loop running in the current process

Question:

I’m a little confused about the interaction between multiprocessing and asyncio. My goal is to be able to spawn async processes from other async processes. Here is a small example:

import asyncio
from multiprocessing import Process


async def sleep_n(n):
    await asyncio.sleep(n)


def async_sleep(n):
    # This does not work
    #   
    # loop = asyncio.get_event_loop()
    # loop.run_until_complete(sleep_n(n))

    # This works
    asyncio.run(sleep_n(n))


async def spawn_another():
    await asyncio.sleep(0.2)
    p = Process(target=async_sleep, args=(5,))
    p.start()
    p.join()


def spawn():
    # This does not work
    # loop = asyncio.get_event_loop()
    # loop.run_until_complete(spawn_another())

    # This works
    asyncio.run(spawn_another())


def doit():
    p = Process(target=spawn)
    p.start()
    p.join()


if __name__ == '__main__':
    doit()

If I replace asyncio.run with get_event_loop().run_until_complete, I get the following error: "The event loop is already running". This is raised from loop.run_until_complete(sleep_n(n)). What’s the difference between these two?

(NB: the reason I care about this is, if it makes a difference in the proposed remedy, is because in my actual code the thing I’m running in async is a grpc.aio client which apparently requires me to use run_until_complete or otherwise I get an error about a Future that’s attached to a different event loop. That said, this is just an aside and not really material to the question above.)

Asked By: danben

||

Answers:

I think I’ve pinned it down. Its an issue with how multiprocessing works on Linux vs Windows/MacOS

From the docs:

Contexts and start methods

Depending on the platform, multiprocessing supports three ways to
start a process. These start methods are

  • spawn

    The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run
    the process object’s run() method. In particular, unnecessary file
    descriptors and handles from the parent process will not be inherited.
    Starting a process using this method is rather slow compared to using
    fork or forkserver.

    Available on Unix and Windows. The default on Windows and macOS.

  • fork

    The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively
    identical to the parent process. All resources of the parent are
    inherited by the child process. Note that safely forking a
    multithreaded process is problematic.

    Available on Unix only. The default on Unix.

  • forkserver

    When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new
    process is needed, the parent process connects to the server and
    requests that it fork a new process. The fork server process is single
    threaded so it is safe for it to use os.fork(). No unnecessary
    resources are inherited.

    Available on Unix platforms which support passing file descriptors over Unix pipes.

So this works on MacOS and Windows because the default is spawn, versus it fails on Linux where the default is fork. Because we’re using fork, the entire set of data is being mapped, meaning that we’re sharing the existing already instantiated local event loop in the new process. That’s why the event loop states that it is already be running (and asyncio is by design non-re-entrant).

To get around this you can set the mode to spawn manually in the main. When we use this mode, the interpreter will be newly invoked, meaning that there will be no existing event loop to conflict on.

if __name__ == '__main__':
    import multiprocessing as mp
    mp.set_start_method('spawn')
    doit()
Answered By: flakes