Running external command in worker process and capturing output to a single file

Question:

This is my naive approach to call external commands in a worker process and append all the output of the commands to a single file. This is an example code.

from concurrent.futures import ProcessPoolExecutor
from functools import partial
import multiprocessing
import subprocess

def worker_process_write_output(fh, lock, mylist):
    output = subprocess.run("dir /b", shell=True, stdout=subprocess.PIPE, universal_newlines=True).stdout
    with lock:  # Need lock to prevent multiple processes writing to the file simultenously
        fh.write(mylist)
        fh.writelines(output)    

if __name__ == '__main__':
    with open("outfile.txt", "a") as fh: # I am opening file in main process to avoid the overhead of opening & closing the file multiple times in each worker process
        mylist = [1, 2, 3, 4]
        with ProcessPoolExecutor() as executor:
            lock = multiprocessing.Manager().Lock()
            executor.map(partial(worker_process_write_output, fh, lock), mylist)

This code hangs when run. What are the mistakes and corrections?
Some of them I guess are 1. Can’t pass file handle to worker process. Need to open and close file in worker process. Not sure of the reason
2. Can’t use subprocess.run in worker process, need to use os.popen(“dir /b”).read() or something else 3. Not sure if lock is necessary and if necessary is this the right lock?

Asked By: ontherocks

||

Answers:

File contexts can be passed between processes so I’m not sure why your code is deadlocking in the file handler. Having said that, I’m assuming you’re doing a lot of work in your run() function, so the overhead of opening/closing the file once per process shouldn’t be terribly significant. If it’s not a lot of work that’s being done, multiprocessing is probably not the best choice to begin with anyway since it involves serious overhead.

Additionally, fh.write(mylist) raises a TypeError: write() argument must be str, not int, so we need to cast with fh.write(str(mylist)).

Here’s the workaround:

import multiprocessing
import subprocess
from concurrent.futures import ProcessPoolExecutor
from functools import partial

def worker_process_write_output(lock, mylist):
    output = subprocess.run("dir /b", shell=True, stdout=subprocess.PIPE,
                            universal_newlines=True).stdout

    with lock:
        with open("outfile.txt", "a") as fh:
            fh.write(str(mylist))
            fh.writelines(output)


if __name__ == '__main__':
    mylist = [1, 2, 3, 4]

    with ProcessPoolExecutor() as executor:
        lock = multiprocessing.Manager().Lock()
        executor.map(partial(worker_process_write_output, lock), mylist)
Answered By: ggorlen