Avoiding IO time delay in a loop using multiprocessing

Question:

I am running prediction using a trained tensorflow model and generating data using it on the images coming from a simulator. But the issue here I need to save image too for each prediction I am making which is creating delay in the loop sometime causing issues in simulator. Is there any way we can use python’s multiprocessing module to create a producer consumer architecture to avoid the IO cost in the loop?

for data in data_arr:
    speed=float(data['speed'])
    image=Image.open(BytesIO(base64.b64decode(data['image'])))
    
    image=np.asarray(image)
    img_c=image.copy()
    image=img_preprocess(image)
    image=np.array([image])

    steering_angle=float(model_steer.predict(image))
    #throttle=float(model_thr.predict(image))
    throttle=1.0-speed/speed_limit
    
    save_image(img_c,steering_angle)
    print('{} {} {}'.format(steering_angle,throttle,speed))

    send_control(steering_angle,throttle)

I tried to experiment similar concept for processing images from color to grayscale but instead of decreasing time. The total time increased from 0.1 sec to 17 sec.

import numpy as np
import cv2
import os
import time
from multiprocessing import Pool,RawArray
import ctypes


files_path=os.listdir('./imgs/')
files_path=list(map(lambda x:'./imgs/'+x,files_path))

temp_img=np.zeros((160,320))


var_dict = {}

def init_worker(X, h,w):
    # Using a dictionary is not strictly necessary. You can also
    # use global variables.
    var_dict['X']=X
    var_dict['h'] = h
    var_dict['w'] = w

def worker_func(idx):
    # Simply computes the sum of the i-th row of the input matrix X
    X_np = np.frombuffer(var_dict['X'], dtype=np.uint8)
    X_np=X_np.reshape(var_dict['h'],var_dict['w'])
    cv2.imwrite('./out/'+str(idx)+'.jpg',X_np)



if __name__=='__main__':
    start_time=time.time()
    for idx,filepath in enumerate(files_path):
        img=cv2.imread(filepath)
        img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
        h,w=img.shape[:2]
        mulproc_array=RawArray(ctypes.c_uint8,160*320)
        X_np = np.frombuffer(mulproc_array, dtype=np.uint8).reshape(160,320)
        np.copyto(X_np,img)
        #cv2.imwrite('./out/'+str(idx)+'.jpg',img)
        with Pool(processes=1, initializer=init_worker, initargs=(mulproc_array, h,w)) as pool:
          pool.map(worker_func,[idx])

    end_time=time.time()


    print('Time taken=',(end_time-start_time))
Asked By: Vikas Kumar Ojha

||

Answers:

  1. there is no reason for using RawArray, as multiprocessing will already use pickle for objects transfer which has approximately the same size as the numpy array, and using RawArray is different from your use case.
  2. you don’t need to wait for the saving function to end, you can run it asynchronously.
  3. you shouldn’t be closing the pool until you are done with everything, as creating a worker takes a very long time (in the order of 10-100ms)
def worker_func(img,idx):
    cv2.imwrite('./out/'+str(idx)+'.jpg',img)


if __name__=='__main__':
    start_time=time.time()
    with Pool(processes=1) as pool:
        results = []
        for idx,filepath in enumerate(files_path):
            img=cv2.imread(filepath)
            img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) # do other work here
            # next line converts image to uint8 before sending it to reduce its size
            results.append(pool.apply_async(worker_func,args=(img.astype(np.uint8),idx)))
        end_time=time.time() # technically the transfer is done, at this line.
        for res in results:
            res.get() # call this before closing the pool to make sure all images are saved.
        print('Time taken=',(end_time-start_time))

you might want to experiment with threading instead of multiprocessing, to avoid data copy altogether, since writing to disk drops the GIL, but the results are not guaranteed to be faster.

Answered By: Ahmed AEK