groupby two columns take highest sum (Pandas)
Question:
Looking to see which client has the highest price sum for the month of February. 2/1/2022
Data
client box Price date
charles AA 5,000 2/1/2022
charles AA 5,050 2/1/2022
charles AA 5,075 2/1/2022
cara BB 25,116 2/1/2022
cara BB 5,154 2/1/2022
lu CC 0 2/1/2022
max DD 10,000 3/1/2022
Desired
client box Price date
cara BB 30,270 2/1/2022
Doing
df.groupby(['client','date'])
.agg({'Price':'sum'}).reset_index()
Any suggestion is helpful.
Answers:
idxmax
will return the index of the maximum value—which you can then use to look up the row you want. Be aware that in the case that two customers are tied for the highest price, it will return only the first occurence.
Example:
(
df
# Fist filter to only Februrary
.loc[lambda df_: df_["date"].dt.month == 2]
.groupby(["client", "box"])["Price"]
.sum()
.reset_index()
# Then select where price is max
.loc[lambda df_: df_["Price"].idxmax()]
)
import pandas as pd
df = pd.DataFrame({"client": ["charles", "charles", "charles", "cara", "cara", "lu", "max"],
"box": ["AA", "AA", "AA", "BB", "BB", "CC", "DD"],
"price": [5000, 5050, 5075, 25116, 5154, 0, 10000],
"date": ["2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "3/1/2022"]})
# print(df)
print(df.groupby(by= ["client", "box", "date"]).price.aggregate('sum'))
Based on the OP’s desired output, here is a suggestion:
gdf = df.groupby(['client','box','date']).agg({'Price':'sum'}).reset_index()
gdf.loc[gdf.Price.idxmax()]
If box is always the same per client.
df = df.groupby(["client", "date"]).agg({"box": "first", "Price": "sum"})
df = df[df["Price"].eq(df["Price"].max())].reset_index()
If box could be different per client.
df = df.groupby(["client", "date", "box"]).agg({"Price": "sum"})
df = df[df["Price"].eq(df["Price"].max())].reset_index()
Looking to see which client has the highest price sum for the month of February. 2/1/2022
Data
client box Price date
charles AA 5,000 2/1/2022
charles AA 5,050 2/1/2022
charles AA 5,075 2/1/2022
cara BB 25,116 2/1/2022
cara BB 5,154 2/1/2022
lu CC 0 2/1/2022
max DD 10,000 3/1/2022
Desired
client box Price date
cara BB 30,270 2/1/2022
Doing
df.groupby(['client','date'])
.agg({'Price':'sum'}).reset_index()
Any suggestion is helpful.
idxmax
will return the index of the maximum value—which you can then use to look up the row you want. Be aware that in the case that two customers are tied for the highest price, it will return only the first occurence.
Example:
(
df
# Fist filter to only Februrary
.loc[lambda df_: df_["date"].dt.month == 2]
.groupby(["client", "box"])["Price"]
.sum()
.reset_index()
# Then select where price is max
.loc[lambda df_: df_["Price"].idxmax()]
)
import pandas as pd
df = pd.DataFrame({"client": ["charles", "charles", "charles", "cara", "cara", "lu", "max"],
"box": ["AA", "AA", "AA", "BB", "BB", "CC", "DD"],
"price": [5000, 5050, 5075, 25116, 5154, 0, 10000],
"date": ["2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "2/1/2022", "3/1/2022"]})
# print(df)
print(df.groupby(by= ["client", "box", "date"]).price.aggregate('sum'))
Based on the OP’s desired output, here is a suggestion:
gdf = df.groupby(['client','box','date']).agg({'Price':'sum'}).reset_index()
gdf.loc[gdf.Price.idxmax()]
If box is always the same per client.
df = df.groupby(["client", "date"]).agg({"box": "first", "Price": "sum"})
df = df[df["Price"].eq(df["Price"].max())].reset_index()
If box could be different per client.
df = df.groupby(["client", "date", "box"]).agg({"Price": "sum"})
df = df[df["Price"].eq(df["Price"].max())].reset_index()