Multiprocessing nested loops to optimize sagemaker instance usage?
Question:
Hi I am trying to understand/implement multiprocessing for my nested loop below.
I am currently using sagemaker studio, and I am trying to optimise my instance usage.
I have about 500 000 customers, each customer is an independent calculation.
So I was wondering if i use an instance with 96 vCPUs does that mean I can, run around 5400 customers per vcpu?,
also how can i add multi-processing to my nested loop below, any advice help will be appreciated.
end_dates = End.reshape(-1) # array([30, 31, 30, 31, 31, 28, 31, 30, 31, 30]); just to simplify access to the end date values
results={}
for cust_id, state, amount, start, group, loan_rate in data1.itertuples(name=None, index=False):
res = [amount * matrix_data[start-1, state, :]]
for year in range(start+1, len(matrix_data)+1,):
res.append(lookup1.loc[year].iat[0] * np.array(res[-1]))
res.append(res[-1] * loan_rate * end_dates[year-1]/365) # year - 1 here
res.append(res[-1]+ 100)
res.append(np.linalg.multi_dot([res[-1],matrix_data[year-1]]))
results[cust_id] = res
my previous question provides the preprocessing steps here :
How to add another iterator to nested loop in python without additional loop?
Answers:
please correct me if im wrong
from multiprocessing import Pool
def compute_result(cust_id, state, amount, start, group, loan_rate):
res = [amount * matrix_data[start-1, state, :]]
for year in range(start+1, len(matrix_data)+1,):
res.append(lookup1.loc[year].iat[0] * np.array(res[-1]))
res.append(res[-1] * loan_rate * end_dates[year-1]/365) # year - 1 here
res.append(res[-1]+ 100)
res.append(np.linalg.multi_dot([res[-1],matrix_data[year-1]]))
return (cust_id, res)
if __name__ == '__main__':
with Pool() as p:
results = dict(p.starmap(compute_result, data1.itertuples(name=None, index=False)))
Hi I am trying to understand/implement multiprocessing for my nested loop below.
I am currently using sagemaker studio, and I am trying to optimise my instance usage.
I have about 500 000 customers, each customer is an independent calculation.
So I was wondering if i use an instance with 96 vCPUs does that mean I can, run around 5400 customers per vcpu?,
also how can i add multi-processing to my nested loop below, any advice help will be appreciated.
end_dates = End.reshape(-1) # array([30, 31, 30, 31, 31, 28, 31, 30, 31, 30]); just to simplify access to the end date values
results={}
for cust_id, state, amount, start, group, loan_rate in data1.itertuples(name=None, index=False):
res = [amount * matrix_data[start-1, state, :]]
for year in range(start+1, len(matrix_data)+1,):
res.append(lookup1.loc[year].iat[0] * np.array(res[-1]))
res.append(res[-1] * loan_rate * end_dates[year-1]/365) # year - 1 here
res.append(res[-1]+ 100)
res.append(np.linalg.multi_dot([res[-1],matrix_data[year-1]]))
results[cust_id] = res
my previous question provides the preprocessing steps here :
How to add another iterator to nested loop in python without additional loop?
please correct me if im wrong
from multiprocessing import Pool
def compute_result(cust_id, state, amount, start, group, loan_rate):
res = [amount * matrix_data[start-1, state, :]]
for year in range(start+1, len(matrix_data)+1,):
res.append(lookup1.loc[year].iat[0] * np.array(res[-1]))
res.append(res[-1] * loan_rate * end_dates[year-1]/365) # year - 1 here
res.append(res[-1]+ 100)
res.append(np.linalg.multi_dot([res[-1],matrix_data[year-1]]))
return (cust_id, res)
if __name__ == '__main__':
with Pool() as p:
results = dict(p.starmap(compute_result, data1.itertuples(name=None, index=False)))