Pandas – split one row value and merge with multiple rows

Question:

I have two dataframes like as below

proj_df = pd.DataFrame({'reg_id':[1,2,3,4,5,6,7],
                              'partner': ['ABC_123','ABC_123','ABC_123','ABC_123','ABC_123','ABC_123','ABC_123'],
                              'part_no':['P123','P123','P123','P123','P123','P123','P123'],
                              'cust_info':['Apple','Apple','Apple','Apple','Apple','Apple','Tesla'],
                              'qty_1st_year':[100,100,600,150,50,0,10]})

order_df = pd.DataFrame({'partner': ['ABC_123','ABC_123','JKL_123','MNO_123'],
                         'part_no':['P123','P123','Q123','P567'],
                         'cust_info':['Apple','Hyundai','REON','Renault'],
                         'order_qty':[1000,600,50,0]})

I would like to do the below

a) Merge two dataframes based on partner,part_no,cust_info

b) split the order_qty column from order_df and assign the appropriate portion to a new column called assigned_qty

c) appropriate portion is determined by the percentage distribution of qty_1st_year. Meaning, you divide individual qty_1st_year value by the total sum of Qty_1st_year for each group of partner,part_no and cust_info.

So, I tried the below

sum_df = proj_df.groupby(['partner','part_no','cust_info'])['qty_1st_year'].sum().reset_index()
sum_df.columns = ['partner','part_no','cust_info','total_qty_all_project']

t1=proj_df.merge(order_df,on=['partner','part_no','cust_info'],how='left')
t2 = t1.merge(sum_df,on=['partner','part_no','cust_info'],how='left')
t2['pct_value'] = (t2['qty_1st_year']/t2['total_qty_all_project'])*100
proj_df['assigned_value'] = (t2['order_qty']*t2['pct_value'])/100

While this seems to work fine, I would like to know is there any other better and elegant way to do this task.

I expect my output to be like as below

enter image description here

Asked By: The Great

||

Answers:

Use GroupBy.transform with sum for new column for avoid double merge, last if need remove columns after multiplication and division add DataFrame.pop, last if necessary divide by 10 for expected ouput:

proj_df['total_qty'] = (proj_df.groupby(['partner','part_no','cust_info'])['qty_1st_year']
                               .transform('sum'))

df=proj_df.merge(order_df,on=['partner','part_no','cust_info'],how='left')

df['assigned_value'] = (df.pop('order_qty')*
                       (df['qty_1st_year']/
                       df.pop('total_qty'))).div(10)
print (df)
   reg_id  partner part_no cust_info  qty_1st_year  assigned_value
0       1  ABC_123    P123     Apple           100            10.0
1       2  ABC_123    P123     Apple           100            10.0
2       3  ABC_123    P123     Apple           600            60.0
3       4  ABC_123    P123     Apple           150            15.0
4       5  ABC_123    P123     Apple            50             5.0
5       6  ABC_123    P123     Apple             0             0.0
6       7  ABC_123    P123     Tesla            10             NaN
Answered By: jezrael