Python multiprocessing doesn't seem to use more than one core

Question:

I want to use Python multiprocessing to run grid search for a predictive model.
When I look at core usage, it always seem to be using only one core. Any idea what I’m doing wrong?

import multiprocessing
from sklearn import svm
import itertools

#first read some data
#X will be my feature Numpy 2D array
#y will be my 1D Numpy array of labels

#define the grid        
C = [0.1, 1]
gamma = [0.0]
params = [C, gamma]
grid = list(itertools.product(*params))
GRID_hx = []

def worker(par, grid_list):
    #define a sklearn model
    clf = svm.SVC(C=g[0], gamma=g[1],probability=True,random_state=SEED)
    #run a cross validation function: returns error
    ll = my_cross_validation_function(X, y, model=clf, n=1, test_size=0.2)
    print(par, ll)
    grid_list.append((par, ll))


if __name__ == '__main__':
   manager = multiprocessing.Manager()
   GRID_hx = manager.list()
   jobs = []
   for g in grid:
      p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
      jobs.append(p)
      p.start()
      p.join()

   print("n-------------------")
   print("SORTED LIST")
   print("-------------------")
   L = sorted(GRID_hx, key=itemgetter(1))
   for l in L[:5]:
      print l
Asked By: ADJ

||

Answers:

I’d say :

for g in grid:
    g.p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
    jobs.append(g.p)
    g.p.start()
for g in grid:
    g.p.join()

Currently you’re spawning a job, then waithing for it to be done, then going to the next one.

Answered By: Calvin1602

Your problem is that you join each job immediately after you started it:

for g in grid:
    p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
    jobs.append(p)
    p.start()
    p.join()

join blocks until the respective process has finished working. This means that your code starts only one process at once, waits until it is finished and then starts the next one.

In order for all processes to run in parallel, you need to first start them all and then join them all:

jobs = []
for g in grid:
    p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
    jobs.append(p)
    p.start()

for j in jobs:
    j.join()

Documentation: link

Answered By: helmbert

According to the documentation the join() command locks the current thread until the specified thread returns. So you are basically starting each thread in the for loop and then wait for it to finish, BEFORE you proceed to the next iteration.

I would suggest moving the joins outside the loop!

Answered By: Robin Nabel
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.