How to groupby().transform() to value_counts() in pandas?
Question:
I am processing a pandas dataframe df1
with prices of items.
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
I create Minimum
using:
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(min)
How do I create Most_Common_Price
?
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(value_counts()) # Doesn't work
In the moment, I use a multi-step approach:
for item in df1.Item.unique().tolist(): # Pseudocode
df1 = df1[df1.Price == Item] # Pseudocode
df1.Price.value_counts().max() # Pseudocode
which is overkill. There must be a more simple way, ideally in one line
How to groupby().transform() to value_counts() in pandas?
Answers:
You could use groupby
+ transform
with value_counts
and idxmax
.
df['Most_Common_Price'] = (
df.groupby('Item')['Price'].transform(lambda x: x.value_counts().idxmax()))
df
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
An improvement involves the use of pd.Series.map
,
# Thanks, Vaishali!
df['Item'] = (df['Item'].map(df.groupby('Item')['Price']
.agg(lambda x: x.value_counts().idxmax()))
df
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
A nice way is to use pd.Series.mode
, if you want the most common element (i.e. the mode).
In [32]: df
Out[32]:
Item Price Minimum
0 Coffee 1 1
1 Coffee 2 1
2 Coffee 2 1
3 Tea 3 3
4 Tea 4 3
5 Tea 4 3
In [33]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(pd.Series.mode)
In [34]: df
Out[34]:
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
As @Wen noted, pd.Series.mode
can returns a pd.Series
of values, so just grab the first:
Out[67]:
Item Price Minimum
0 Coffee 1 1
1 Coffee 2 1
2 Coffee 2 1
3 Tea 3 3
4 Tea 4 3
5 Tea 4 3
6 Tea 3 3
In [68]: df[df.Item =='Tea'].Price.mode()
Out[68]:
0 3
1 4
dtype: int64
In [69]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(lambda S: S.mode()[0])
In [70]: df
Out[70]:
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 3
4 Tea 4 3 3
5 Tea 4 3 3
6 Tea 3 3 3
#Initial dataframe having Milk as Nan value to produce the scenario if we have any group nan value
data_stack_try = [['Coffee',1],['Coffee',2],['Coffee',2],['Tea',3],['Tea',4],['Tea',4],['Milk', np.nan]]
df_stack_try = pd.DataFrame(data_stack_try, columns=["Item","Price"])
print("---Before Min---")
print(df_stack_try)
#Created Minimum column with transform function with 'min'
df_stack_try["Minimum"] = df_stack_try.groupby(["Item"])['Price'].transform(min)
print("---After Min----")
print(df_stack_try)
#Function written to take care of null values (Milk item is np.nan)
def mode_group(grp):
try:
#return mode of each group passed for each row
return grp.mode()[0]
except BaseException as e:
# This exception will be raised if there is no mode value
# In this case it will appear for Milk value as because of nan, it can't have mode value
print("Exception!!!")
df_stack_try["Most_Common_Price"] = df_stack_try.groupby('Item')['Price'].transform(lambda x: mode_group(x))
print("---After Mode----")
print(df_stack_try)
---Before Min---
Item Price
0 Coffee 1.0
1 Coffee 2.0
2 Coffee 2.0
3 Tea 3.0
4 Tea 4.0
5 Tea 4.0
6 Milk NaN
---After Min----
Item Price Minimum
0 Coffee 1.0 1.0
1 Coffee 2.0 1.0
2 Coffee 2.0 1.0
3 Tea 3.0 3.0
4 Tea 4.0 3.0
5 Tea 4.0 3.0
6 Milk NaN NaN
Exception!!!
---After Mode----
Item Price Minimum Most_Common_Price
0 Coffee 1.0 1.0 2.0
1 Coffee 2.0 1.0 2.0
2 Coffee 2.0 1.0 2.0
3 Tea 3.0 3.0 4.0
4 Tea 4.0 3.0 4.0
5 Tea 4.0 3.0 4.0
6 Milk NaN NaN NaN
I am processing a pandas dataframe df1
with prices of items.
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
I create Minimum
using:
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(min)
How do I create Most_Common_Price
?
df1["Minimum"] = df1.groupby(["Item"])['Price'].transform(value_counts()) # Doesn't work
In the moment, I use a multi-step approach:
for item in df1.Item.unique().tolist(): # Pseudocode
df1 = df1[df1.Price == Item] # Pseudocode
df1.Price.value_counts().max() # Pseudocode
which is overkill. There must be a more simple way, ideally in one line
How to groupby().transform() to value_counts() in pandas?
You could use groupby
+ transform
with value_counts
and idxmax
.
df['Most_Common_Price'] = (
df.groupby('Item')['Price'].transform(lambda x: x.value_counts().idxmax()))
df
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
An improvement involves the use of pd.Series.map
,
# Thanks, Vaishali!
df['Item'] = (df['Item'].map(df.groupby('Item')['Price']
.agg(lambda x: x.value_counts().idxmax()))
df
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
A nice way is to use pd.Series.mode
, if you want the most common element (i.e. the mode).
In [32]: df
Out[32]:
Item Price Minimum
0 Coffee 1 1
1 Coffee 2 1
2 Coffee 2 1
3 Tea 3 3
4 Tea 4 3
5 Tea 4 3
In [33]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(pd.Series.mode)
In [34]: df
Out[34]:
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 4
4 Tea 4 3 4
5 Tea 4 3 4
As @Wen noted, pd.Series.mode
can returns a pd.Series
of values, so just grab the first:
Out[67]:
Item Price Minimum
0 Coffee 1 1
1 Coffee 2 1
2 Coffee 2 1
3 Tea 3 3
4 Tea 4 3
5 Tea 4 3
6 Tea 3 3
In [68]: df[df.Item =='Tea'].Price.mode()
Out[68]:
0 3
1 4
dtype: int64
In [69]: df['Most_Common_Price'] = df.groupby(["Item"])['Price'].transform(lambda S: S.mode()[0])
In [70]: df
Out[70]:
Item Price Minimum Most_Common_Price
0 Coffee 1 1 2
1 Coffee 2 1 2
2 Coffee 2 1 2
3 Tea 3 3 3
4 Tea 4 3 3
5 Tea 4 3 3
6 Tea 3 3 3
#Initial dataframe having Milk as Nan value to produce the scenario if we have any group nan value
data_stack_try = [['Coffee',1],['Coffee',2],['Coffee',2],['Tea',3],['Tea',4],['Tea',4],['Milk', np.nan]]
df_stack_try = pd.DataFrame(data_stack_try, columns=["Item","Price"])
print("---Before Min---")
print(df_stack_try)
#Created Minimum column with transform function with 'min'
df_stack_try["Minimum"] = df_stack_try.groupby(["Item"])['Price'].transform(min)
print("---After Min----")
print(df_stack_try)
#Function written to take care of null values (Milk item is np.nan)
def mode_group(grp):
try:
#return mode of each group passed for each row
return grp.mode()[0]
except BaseException as e:
# This exception will be raised if there is no mode value
# In this case it will appear for Milk value as because of nan, it can't have mode value
print("Exception!!!")
df_stack_try["Most_Common_Price"] = df_stack_try.groupby('Item')['Price'].transform(lambda x: mode_group(x))
print("---After Mode----")
print(df_stack_try)
---Before Min---
Item Price
0 Coffee 1.0
1 Coffee 2.0
2 Coffee 2.0
3 Tea 3.0
4 Tea 4.0
5 Tea 4.0
6 Milk NaN
---After Min----
Item Price Minimum
0 Coffee 1.0 1.0
1 Coffee 2.0 1.0
2 Coffee 2.0 1.0
3 Tea 3.0 3.0
4 Tea 4.0 3.0
5 Tea 4.0 3.0
6 Milk NaN NaN
Exception!!!
---After Mode----
Item Price Minimum Most_Common_Price
0 Coffee 1.0 1.0 2.0
1 Coffee 2.0 1.0 2.0
2 Coffee 2.0 1.0 2.0
3 Tea 3.0 3.0 4.0
4 Tea 4.0 3.0 4.0
5 Tea 4.0 3.0 4.0
6 Milk NaN NaN NaN