Pandas groupby columns and multiply two other columns in the aggregate function

Question

I have a hopefully easy problem for some help stack helpers! I have a dataframe:

df = pd.DataFrame({'Quantity': [2, 3, 4, 1, 2, 1, 4, 5],
                   'User': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                   'Price': [5, 3, 2, 6, 2, 3, 4, 5],
                   'Shop': ['X', 'X', 'X', 'Y', 'Z', 'Z', 'X', 'Y'],
                   'Day': ['M', 'T', 'W', 'W', 'M', 'T', 'M', 'W']
                   })


|   QuantityUser    Price   Shop    Day
0   2   A   5   X   M
1   3   A   3   X   T
2   4   B   2   X   W
3   1   B   6   Y   W
4   2   B   2   Z   M
5   1   C   3   Z   T
6   4   C   4   X   M
7   5   C   5   Y   W

My trouble comes when I try and aggregate it by shop and day. I’m hoping for the users in a shop by day and the average spent in that shop on that day. So in SQL it would be: AVG(Quantity*Price)

I have the first part:

df.groupby(by=['Shop','Day']).agg({'User': 'count'})

But my only solution to the other aggregation is first create a column and then aggregate it.

df['Spend'] = df['Price'] * df['Quantity']
df.groupby(by=['Shop','Day']).agg({'User': 'count' ,'Spend' :'mean' })

Is there a better method I am missing? Ideally I want the aggregation to happen alongside the Count aggregate without the need for a new column created.

Asked By: FiercestJim

||

Source

Answer 1

You can achieve that without creating new column like so:

result = df.groupby(by=['Shop', 'Day']).agg(
    User=('User', 'count'),
    Spend=('Price', lambda x: (x * df.loc[x.index, 'Quantity']).mean())
)

Output is the same as yours:

                  User    Spend
Shop Day                       
X    M             2       13.0
     T             1        9.0
     W             1        8.0
Y    W             2       15.5
Z    M             1        4.0
     T             1        3.0

Answered By: Hamzah

Answer 2

Your method is probably the right. You can also use assign to chain your transformations:

>>> (df.assign(Spend=df['Price']*df['Quantity'])
       .groupby(by=['Shop','Day'], as_index=False)
       .agg({'User': 'count' ,'Spend' :'mean' }))

  Shop Day  User  Spend
0    X   M     2   13.0
1    X   T     1    9.0
2    X   W     1    8.0
3    Y   W     2   15.5
4    Z   M     1    4.0
5    Z   T     1    3.0

Answered By: Corralien

Answer 3

Other solution: Use .apply that returns pd.Series:

df = df.groupby(["Shop", "Day"]).apply(
    lambda x: pd.Series(
        {"User": x["User"].count(), "Spend": (x["Price"] * x["Quantity"]).mean()}
    )
)
print(df)

Prints:

          User  Spend
Shop Day             
X    M     2.0   13.0
     T     1.0    9.0
     W     1.0    8.0
Y    W     2.0   15.5
Z    M     1.0    4.0
     T     1.0    3.0

Answered By: Andrej Kesely

Pandas groupby columns and multiply two other columns in the aggregate function

Question:

Answers: