How to create a lambda function to sum dataframe values based on criteria and presence in a list

Question:

I have one dataframe containing a daily employee list, and another containing a series of sales.

daily_employee_df:

| EE_ID| Date      |
| -----| ----------|
|   101| 20220904  |
|   102| 20220904  |
|   106| 20220904  |
|   102| 20220905  |
|   103| 20220905  |
|   104| 20220905  |

all_sales_df:

| Sale_ID | Date    | Sale_Amt| EEs_Present    |
| ------- | --------|---------|----------------|
|     0001| 20220904|   100.04| [101, 102, 106]|
|     0002| 20220905|   998.06| [102, 103, 104]|

What is an efficient way to sum the Sale_Amt values each employee was present for on each day and add that sum to daily_employee_df? I’m dealing with thousands of sales each day.

I was able to get the number of sales for each employee and day using the following:

daily_employee_df['EE_Sales']  = daily_employee_df.apply(lambda x: len(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]), axis = 1)

But I have not been able to sum the sale total in a similar way. I tried wrapping it with sum, but the syntax didn’t seem to work.

Thanks for any help!

Asked By: Not Dave

||

Answers:

Very close – you can use sum() and add the column you’re summing at the end with [‘Sale_Amt’]

Count of sales (already done in the question):

daily_employee_df['EE_Sales_Count']  = daily_employee_df.apply(lambda x: len(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]), axis = 1)

Sum of sales:

daily_employee_df['EE_Sales_Sum']  = daily_employee_df.apply(lambda x: sum(all_sales_df[(all_sales_df['Date'] == x['Date']) & ([str(x['EE_ID']) in c for c in list(all_sales_df['EEs_Present'])])]['Sale_Amt']), axis = 1)
Answered By: dmacp
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.