Pandas groupby columns and multiply two other columns in the aggregate function
Question:
I have a hopefully easy problem for some help stack helpers! I have a dataframe:
df = pd.DataFrame({'Quantity': [2, 3, 4, 1, 2, 1, 4, 5],
'User': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'Price': [5, 3, 2, 6, 2, 3, 4, 5],
'Shop': ['X', 'X', 'X', 'Y', 'Z', 'Z', 'X', 'Y'],
'Day': ['M', 'T', 'W', 'W', 'M', 'T', 'M', 'W']
})
| QuantityUser Price Shop Day
0 2 A 5 X M
1 3 A 3 X T
2 4 B 2 X W
3 1 B 6 Y W
4 2 B 2 Z M
5 1 C 3 Z T
6 4 C 4 X M
7 5 C 5 Y W
My trouble comes when I try and aggregate it by shop and day. I’m hoping for the users in a shop by day and the average spent in that shop on that day. So in SQL it would be: AVG(Quantity*Price)
I have the first part:
df.groupby(by=['Shop','Day']).agg({'User': 'count'})
But my only solution to the other aggregation is first create a column and then aggregate it.
df['Spend'] = df['Price'] * df['Quantity']
df.groupby(by=['Shop','Day']).agg({'User': 'count' ,'Spend' :'mean' })
Is there a better method I am missing? Ideally I want the aggregation to happen alongside the Count
aggregate without the need for a new column created.
Answers:
You can achieve that without creating new column like so:
result = df.groupby(by=['Shop', 'Day']).agg(
User=('User', 'count'),
Spend=('Price', lambda x: (x * df.loc[x.index, 'Quantity']).mean())
)
Output is the same as yours:
User Spend
Shop Day
X M 2 13.0
T 1 9.0
W 1 8.0
Y W 2 15.5
Z M 1 4.0
T 1 3.0
Your method is probably the right. You can also use assign
to chain your transformations:
>>> (df.assign(Spend=df['Price']*df['Quantity'])
.groupby(by=['Shop','Day'], as_index=False)
.agg({'User': 'count' ,'Spend' :'mean' }))
Shop Day User Spend
0 X M 2 13.0
1 X T 1 9.0
2 X W 1 8.0
3 Y W 2 15.5
4 Z M 1 4.0
5 Z T 1 3.0
Other solution: Use .apply
that returns pd.Series
:
df = df.groupby(["Shop", "Day"]).apply(
lambda x: pd.Series(
{"User": x["User"].count(), "Spend": (x["Price"] * x["Quantity"]).mean()}
)
)
print(df)
Prints:
User Spend
Shop Day
X M 2.0 13.0
T 1.0 9.0
W 1.0 8.0
Y W 2.0 15.5
Z M 1.0 4.0
T 1.0 3.0
I have a hopefully easy problem for some help stack helpers! I have a dataframe:
df = pd.DataFrame({'Quantity': [2, 3, 4, 1, 2, 1, 4, 5],
'User': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'Price': [5, 3, 2, 6, 2, 3, 4, 5],
'Shop': ['X', 'X', 'X', 'Y', 'Z', 'Z', 'X', 'Y'],
'Day': ['M', 'T', 'W', 'W', 'M', 'T', 'M', 'W']
})
| QuantityUser Price Shop Day
0 2 A 5 X M
1 3 A 3 X T
2 4 B 2 X W
3 1 B 6 Y W
4 2 B 2 Z M
5 1 C 3 Z T
6 4 C 4 X M
7 5 C 5 Y W
My trouble comes when I try and aggregate it by shop and day. I’m hoping for the users in a shop by day and the average spent in that shop on that day. So in SQL it would be: AVG(Quantity*Price)
I have the first part:
df.groupby(by=['Shop','Day']).agg({'User': 'count'})
But my only solution to the other aggregation is first create a column and then aggregate it.
df['Spend'] = df['Price'] * df['Quantity']
df.groupby(by=['Shop','Day']).agg({'User': 'count' ,'Spend' :'mean' })
Is there a better method I am missing? Ideally I want the aggregation to happen alongside the Count
aggregate without the need for a new column created.
You can achieve that without creating new column like so:
result = df.groupby(by=['Shop', 'Day']).agg(
User=('User', 'count'),
Spend=('Price', lambda x: (x * df.loc[x.index, 'Quantity']).mean())
)
Output is the same as yours:
User Spend
Shop Day
X M 2 13.0
T 1 9.0
W 1 8.0
Y W 2 15.5
Z M 1 4.0
T 1 3.0
Your method is probably the right. You can also use assign
to chain your transformations:
>>> (df.assign(Spend=df['Price']*df['Quantity'])
.groupby(by=['Shop','Day'], as_index=False)
.agg({'User': 'count' ,'Spend' :'mean' }))
Shop Day User Spend
0 X M 2 13.0
1 X T 1 9.0
2 X W 1 8.0
3 Y W 2 15.5
4 Z M 1 4.0
5 Z T 1 3.0
Other solution: Use .apply
that returns pd.Series
:
df = df.groupby(["Shop", "Day"]).apply(
lambda x: pd.Series(
{"User": x["User"].count(), "Spend": (x["Price"] * x["Quantity"]).mean()}
)
)
print(df)
Prints:
User Spend
Shop Day
X M 2.0 13.0
T 1.0 9.0
W 1.0 8.0
Y W 2.0 15.5
Z M 1.0 4.0
T 1.0 3.0