running multiple processes simultaneously
Question:
I am attempting to create a program in python that runs multiple instances (15) of a function simultaneously over different processors. I have been researching this, and have the below program set up using the Process tool from multiprocessing.
Unfortunately, the program executes each instance of the function sequentially (it seems to wait for one to finish before moving onto the next part of the loop).
from __future__ import print_function
from multiprocessing import Process
import sys
import os
import re
for i in range(1,16):
exec("path%d = 0" % (i))
exec("file%d = open('%d-path','a', 1)" % (i, i))
def stat(first, last):
for j in range(1,40000):
input_string = "water" + str(j) + ".xyz.geocard"
if os.path.exists('./%s' % input_string) == True:
exec("out%d = open('output%d', 'a', 1)" % (first, first))
exec('print("Processing file %s...", file=out%d)' % (input_string, first))
with open('./%s' % input_string,'r') as file:
for line in file:
for i in range(first,last):
search_string = " " + str(i) + " path:"
for result in re.finditer(r'%s' % search_string, line):
exec("path%d += 1" % i)
for i in range(first,last):
exec("print(path%d, file=file%d)" % (i, i))
processes = []
for m in range(1,16):
n = m + 1
p = Process(target=stat, args=(m, n))
p.start()
processes.append(p)
for p in processes:
p.join()
I am reasonably new to programming, and have no experience with parallelization – any help would be greatly appreciated.
I have included the entire program above, replacing "Some Function" with the actual function, to demonstrate that this is not a timing issue. The program can take days to cycle through all 40,000 files (each of which is quite large).
Answers:
I think what is happening is that you are not doing enough in some_function to observe work happening in parallel. It spawns a process, and it completes before the next one gets spawned. If you introduce a random sleep time into some_function
, you’ll see that they are in fact running in parallel.
from multiprocessing import Process
import random
import time
def some_function(first, last):
time.sleep(random.randint(1, 3))
print first, last
processes = []
for m in range(1,16):
n = m + 1
p = Process(target=some_function, args=(m, n))
p.start()
processes.append(p)
for p in processes:
p.join()
Output
2 3
3 4
5 6
12 13
13 14
14 15
15 16
1 2
4 5
6 7
9 10
8 9
7 8
11 12
10 11
Are you sure? I just tried it and it worked for me; the results are out of order on every execution, so they’re being executed concurrently.
Have a look at your function. It takes “first” and “last”, so is its execution time smaller for lower values? In this case, you could expect the smaller numbered arguments to make runtime lower, so it would appear to run in parallel.
ps ux | grep python | grep -v grep | wc -l
> 16
If you execute the code repeatedly (i.e. using a bash script) you can see that every process is starting up. If you want to confirm this, import os
and have the function print out os.getpid()
so you can see they have a different process ID.
So yeah, double check your results because it seems to me like you’ve written it concurrently just fine!
This code below can run 10 processes parallelly printing the numbers from 0
to 99
.
*if __name__ == "__main__":
is needed to run processes on Windows:
from multiprocessing import Process
def test():
for i in range(0, 100):
print(i)
if __name__ == "__main__": # Here
process_list = []
for _ in range(0, 10):
process = Process(target=test)
process_list.append(process)
for process in process_list:
process.start()
for process in process_list:
process.join()
And, this code below is the shorthand for
loop version of the above code running 10 processes parallelly printing the numbers from 0
to 99
:
from multiprocessing import Process
def test():
[print(i) for i in range(0, 100)]
if __name__ == "__main__":
process_list = [Process(target=test) for _ in range(0, 10)]
[process.start() for process in process_list]
[process.join() for process in process_list]
This is the result below:
...
99
79
67
71
67
89
81
99
80
68
...
I am attempting to create a program in python that runs multiple instances (15) of a function simultaneously over different processors. I have been researching this, and have the below program set up using the Process tool from multiprocessing.
Unfortunately, the program executes each instance of the function sequentially (it seems to wait for one to finish before moving onto the next part of the loop).
from __future__ import print_function
from multiprocessing import Process
import sys
import os
import re
for i in range(1,16):
exec("path%d = 0" % (i))
exec("file%d = open('%d-path','a', 1)" % (i, i))
def stat(first, last):
for j in range(1,40000):
input_string = "water" + str(j) + ".xyz.geocard"
if os.path.exists('./%s' % input_string) == True:
exec("out%d = open('output%d', 'a', 1)" % (first, first))
exec('print("Processing file %s...", file=out%d)' % (input_string, first))
with open('./%s' % input_string,'r') as file:
for line in file:
for i in range(first,last):
search_string = " " + str(i) + " path:"
for result in re.finditer(r'%s' % search_string, line):
exec("path%d += 1" % i)
for i in range(first,last):
exec("print(path%d, file=file%d)" % (i, i))
processes = []
for m in range(1,16):
n = m + 1
p = Process(target=stat, args=(m, n))
p.start()
processes.append(p)
for p in processes:
p.join()
I am reasonably new to programming, and have no experience with parallelization – any help would be greatly appreciated.
I have included the entire program above, replacing "Some Function" with the actual function, to demonstrate that this is not a timing issue. The program can take days to cycle through all 40,000 files (each of which is quite large).
I think what is happening is that you are not doing enough in some_function to observe work happening in parallel. It spawns a process, and it completes before the next one gets spawned. If you introduce a random sleep time into some_function
, you’ll see that they are in fact running in parallel.
from multiprocessing import Process
import random
import time
def some_function(first, last):
time.sleep(random.randint(1, 3))
print first, last
processes = []
for m in range(1,16):
n = m + 1
p = Process(target=some_function, args=(m, n))
p.start()
processes.append(p)
for p in processes:
p.join()
Output
2 3
3 4
5 6
12 13
13 14
14 15
15 16
1 2
4 5
6 7
9 10
8 9
7 8
11 12
10 11
Are you sure? I just tried it and it worked for me; the results are out of order on every execution, so they’re being executed concurrently.
Have a look at your function. It takes “first” and “last”, so is its execution time smaller for lower values? In this case, you could expect the smaller numbered arguments to make runtime lower, so it would appear to run in parallel.
ps ux | grep python | grep -v grep | wc -l
> 16
If you execute the code repeatedly (i.e. using a bash script) you can see that every process is starting up. If you want to confirm this, import os
and have the function print out os.getpid()
so you can see they have a different process ID.
So yeah, double check your results because it seems to me like you’ve written it concurrently just fine!
This code below can run 10 processes parallelly printing the numbers from 0
to 99
.
*if __name__ == "__main__":
is needed to run processes on Windows:
from multiprocessing import Process
def test():
for i in range(0, 100):
print(i)
if __name__ == "__main__": # Here
process_list = []
for _ in range(0, 10):
process = Process(target=test)
process_list.append(process)
for process in process_list:
process.start()
for process in process_list:
process.join()
And, this code below is the shorthand for
loop version of the above code running 10 processes parallelly printing the numbers from 0
to 99
:
from multiprocessing import Process
def test():
[print(i) for i in range(0, 100)]
if __name__ == "__main__":
process_list = [Process(target=test) for _ in range(0, 10)]
[process.start() for process in process_list]
[process.join() for process in process_list]
This is the result below:
...
99
79
67
71
67
89
81
99
80
68
...