multiprocessing with a Pool in python?

Question:

I’m writing a small application with a Tkinter GUI to interact with an existing executable that does not have a GUI. The executable can export Solid Edge files to different formats (to PDF for example.) (see Solid Edge Translation services on the www). The goal is to export files in batch to PDF.

So the part of the code that calls the executable is here. I need multiprocessing because running the executable takes a while and it would make my app not responsive.

    for cmd in commands: 
        print(f'running cmd {cmd}')
        p = Process(target=exportSingleFile, args=(cmd,))
        p.start()

(commands = list of commands (as strings) with arguments for input and output file and output filetype (pdf) ). Something like this:

"C:/Program Files/Solid Edge ST9/Program/SolidEdgeTranslationServices.exe" -i="input file" -o="output file" -t=pdf"

But when I try to replace it with this, it seems my app becomes unresponsive and nothing really happens. I guess it’s better to use a pool when exporting potentially dozens of files.

    exportResult = []
    with Pool() as pool:
        exportResult = pool.imap_unordered(exportSingleFile,commands)
    for r in exportResult: 
        print (r)

This is what "exportsinglefile" does

def exportSingleFile(cmd):
    return subprocess.run(cmd, shell=True)
Asked By: Rian

||

Answers:

The multiprocessing module is mostly for running multiple parallel Python processes. Since your commands are already running as separate processes, it’s redundant to use multiprocessing on top of that.

Instead, consider using the subprocess.Popen constructor directly, which starts a subprocess but does not wait for it to complete. Store these process objects in a list. You can then regularly poll() every process in the list to see if it completed. To schedule such a poll, use Tkinter’s after function.

Rough sketch of such an implementation — you will need to adapt this to your situation, and I didn’t test it:

class ParallelCommands:
    def __init__(self, commands, num_parallel):
        self.commands = commands[::-1]
        self.num_parallel = num_parallel
        self.processes = []
        self.poll()

    def poll(self):
        # Poll processes for completion, and raise on errors.
        for process in self.processes:
            process.poll()
            if process.returncode is not None and process.returncode != 0:
                raise RuntimeError("Process finished with nonzero exit code")

        # Remove completed processes.
        self.processes = [
            p for p in self.processes
            if p.returncode is None
        ]

        # Start new processes up to the maximum amount.
        while self.commands and len(self.processes) < self.num_parallel:
            command = self.commands.pop()
            process = subprocess.Popen(command, shell=True)
            self.processes.push(process)

    def is_done(self):
        return not self.processes and not self.commands

To start a bunch of commands, running at most 10 at the same time:

commands = ParallelCommands(["ls /bin", "ls /lib"], 10)

To wait for completion synchronously, blocking the UI; just for demonstration purposes:

while not commands.is_done():
    commands.poll()
    time.sleep(0.1)
Answered By: Thomas