running multiple processes simultaneously

Question:

I am attempting to create a program in python that runs multiple instances (15) of a function simultaneously over different processors. I have been researching this, and have the below program set up using the Process tool from multiprocessing.

Unfortunately, the program executes each instance of the function sequentially (it seems to wait for one to finish before moving onto the next part of the loop).

from __future__ import print_function
from multiprocessing import Process
import sys
import os
import re

for i in range(1,16):
    exec("path%d = 0" % (i))
    exec("file%d = open('%d-path','a', 1)" % (i, i))

def stat(first, last):
    for j in range(1,40000):
        input_string = "water" + str(j) + ".xyz.geocard"
        if os.path.exists('./%s' % input_string) == True:
            exec("out%d = open('output%d', 'a', 1)" % (first, first))
            exec('print("Processing file %s...", file=out%d)' % (input_string, first))
            with open('./%s' % input_string,'r') as file:
                for line in file:
                    for i in range(first,last):
                        search_string = " " + str(i) + " path:"
                        for result in re.finditer(r'%s' % search_string, line):
                            exec("path%d += 1" % i)

            for i in range(first,last):
                exec("print(path%d, file=file%d)" % (i, i))  

processes = []

for m in range(1,16):
    n = m + 1
    p = Process(target=stat, args=(m, n))
    p.start()
    processes.append(p)

for p in processes:
    p.join()

I am reasonably new to programming, and have no experience with parallelization – any help would be greatly appreciated.

I have included the entire program above, replacing "Some Function" with the actual function, to demonstrate that this is not a timing issue. The program can take days to cycle through all 40,000 files (each of which is quite large).

Asked By: user3470516

||

Answers:

I think what is happening is that you are not doing enough in some_function to observe work happening in parallel. It spawns a process, and it completes before the next one gets spawned. If you introduce a random sleep time into some_function, you’ll see that they are in fact running in parallel.

from multiprocessing import Process
import random
import time

def some_function(first, last):
    time.sleep(random.randint(1, 3))
    print first, last

processes = []

for m in range(1,16):
   n = m + 1
   p = Process(target=some_function, args=(m, n))
   p.start()
   processes.append(p)

for p in processes:
   p.join()

Output

2 3
3 4
5 6
12 13
13 14
14 15
15 16
1 2
4 5
6 7
9 10
8 9
7 8
11 12
10 11
Answered By: mdadm

Are you sure? I just tried it and it worked for me; the results are out of order on every execution, so they’re being executed concurrently.

Have a look at your function. It takes “first” and “last”, so is its execution time smaller for lower values? In this case, you could expect the smaller numbered arguments to make runtime lower, so it would appear to run in parallel.

ps ux | grep python | grep -v grep | wc -l
> 16

If you execute the code repeatedly (i.e. using a bash script) you can see that every process is starting up. If you want to confirm this, import os and have the function print out os.getpid() so you can see they have a different process ID.

So yeah, double check your results because it seems to me like you’ve written it concurrently just fine!

Answered By: ruscur

This code below can run 10 processes parallelly printing the numbers from 0 to 99.

*if __name__ == "__main__":
is needed to run processes on Windows:

from multiprocessing import Process

def test():
    for i in range(0, 100):
        print(i)

if __name__ == "__main__": # Here
    process_list = []

    for _ in range(0, 10):
        process = Process(target=test)
        process_list.append(process)

    for process in process_list:
        process.start()

    for process in process_list:
        process.join()

And, this code below is the shorthand for loop version of the above code running 10 processes parallelly printing the numbers from 0 to 99:

from multiprocessing import Process

def test():
    [print(i) for i in range(0, 100)]

if __name__ == "__main__":
    process_list = [Process(target=test) for _ in range(0, 10)]

    [process.start() for process in process_list]

    [process.join() for process in process_list]

This is the result below:

...
99
79
67
71
67
89
81
99
80
68
...
Answered By: Kai – Kazuya Ito