managing sequence of print output after multiprocessing

Question:

I have the following section of code which uses multiprocessing to run def chi2(i) and then prints out the full output:

import cmath, csv, sys, math, re
import numpy as np
import multiprocessing as mp


x1 = np.zeros(npt ,dtype=float)
x2 = np.zeros(npt ,dtype=float)

def chi2(i):
    print("wavelength", i+1," of ", npt)
    some calculations that generate x1[(i)], x2[(i)] and x[(1,i)]

    print("t", i+1,"x1:",x1[(i)])
    print("t", i+1,"x2:",x2[(i)])
    x[(1,i)] = x1[(i)] * x2[(i)]
    print("t", i+1,"x:",x[(1,i)])

    return x[(1,i)]

#-----------single process--------------
#for i in range (npt):
#   chi2(i)

#------------parallel processes-------------
pool = mp.Pool(cpu)
x[1] = pool.map(chi2,[i for i in range (npt)])
pool.close() 

#general output
print("x: n",x.T)

If I run the script using a single process (commented section in script), the output is in the form I desire:

wavelength 1  of  221
         1 x1: -0.3253846181978943
         1 x2: -0.012596285460978723
         1 x: 0.004098637535432249
wavelength 2  of  221
         2 x1: -0.35587046869939154
         2 x2: -0.014209153301058522
         2 x: 0.005056618045069202
...
x:
 [[3.30000000e+02 4.09863754e-03]
 [3.40000000e+02 5.05661805e-03]
 [3.50000000e+02 6.20083938e-03]
...

However, if I run the script with parallel processes, the output of wavelength i of npt is printed after that of print("x: n",x.T) even though it appears first in the script:

x:
 [[3.30000000e+02 4.09863754e-03]
 [3.40000000e+02 5.05661805e-03]
 [3.50000000e+02 6.20083938e-03]
...
wavelength 1  of  221
         1 x1: -0.3253846181978943
         1 x2: -0.012596285460978723
         1 x: 0.004098637535432249
wavelength 2  of  221
         2 x1: -0.35587046869939154
         2 x2: -0.014209153301058522
         2 x: 0.005056618045069202
...

I suspect this has something to do with the processing time of the mp.pool, which takes longer to generate the output after pool.close() than the simpler print("x: n",x.T). May I know how to correct the sequence of output so that running the script with parallel processes will give the same sequence of output as when the script is run with a single process?

Asked By: Jacek

||

Answers:

The point of multiprocessing to to run two processes simultaneously rather than sequentially. Since the processes are independent of each other, they print to the console independently so the order of printing may change from execution to execution.

When you do pool.close(), the pool closes but its processes continue to run. The main process on the other hand continues and prints to the console.

If you want to print only after the processes of the pool are done executing, add pool.join() after pool.close() which will wait for the pool to finish the process before proceeding with main process.

Answered By: ViggyPiggy