Multiprocessing nested loops to optimize sagemaker instance usage?

Question:

Hi I am trying to understand/implement multiprocessing for my nested loop below.

I am currently using sagemaker studio, and I am trying to optimise my instance usage.
I have about 500 000 customers, each customer is an independent calculation.
So I was wondering if i use an instance with 96 vCPUs does that mean I can, run around 5400 customers per vcpu?,
also how can i add multi-processing to my nested loop below, any advice help will be appreciated.

end_dates = End.reshape(-1)  # array([30, 31, 30, 31, 31, 28, 31, 30, 31, 30]); just to simplify access to the end date values
results={}
for cust_id, state, amount, start, group, loan_rate in data1.itertuples(name=None, index=False):
    res = [amount * matrix_data[start-1, state, :]]
    for year in range(start+1, len(matrix_data)+1,):
        res.append(lookup1.loc[year].iat[0] * np.array(res[-1]))
        res.append(res[-1] * loan_rate * end_dates[year-1]/365) # year - 1 here
        res.append(res[-1]+ 100)
        res.append(np.linalg.multi_dot([res[-1],matrix_data[year-1]]))
    results[cust_id] = res
    

my previous question provides the preprocessing steps here :
How to add another iterator to nested loop in python without additional loop?

Asked By: user1000x

||

Answers:

please correct me if im wrong

from multiprocessing import Pool

def compute_result(cust_id, state, amount, start, group, loan_rate):
    res = [amount * matrix_data[start-1, state, :]]
    for year in range(start+1, len(matrix_data)+1,):
        res.append(lookup1.loc[year].iat[0] * np.array(res[-1]))
        res.append(res[-1] * loan_rate * end_dates[year-1]/365) # year - 1 here
        res.append(res[-1]+ 100)
        res.append(np.linalg.multi_dot([res[-1],matrix_data[year-1]]))
    return (cust_id, res)

if __name__ == '__main__':
    with Pool() as p:
        results = dict(p.starmap(compute_result, data1.itertuples(name=None, index=False)))

Answered By: user1000x