How to groupby().transform() to find mode in dataframe?
Question:
I have a dataframe like:
lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]
df = pd.DataFrame(lst, columns =["Class", "Grade"])
I need to get the mode (majority vote) of "Grade" in each "Class".
If it’s a tie vote, assign "x".
Below is what I expect to get:
Class
Grade
Majority_vote
High
A
A
High
A
A
High
B
A
Medium
A
x
Medium
B
x
Medium
C
x
This is my code:
df['majority_vote'] = df.groupby(['Class'])['Grade'].transform(lambda x: x.mode()[0])
I think the code will return ‘nan’ if it’s a tie vote. Then, I will change ‘nan’ to ‘x’ later.
However, what I get is below:
Class
Grade
Majority_vote
High
A
A
High
A
A
High
B
A
Medium
A
A
Medium
B
A
Medium
C
A
At class "Medium", the code returns the 1st element ("A") instead of ‘nan’.
Any other method is appreciated.
Could you please help me? Thank you in advance.
Answers:
The issue with using x.mode()[0]
is that pd.Series(['A', 'B', 'C']).mode()
evaluates to ['A', 'B', 'C']
. Meanwhile, pd.Series(['A', 'A', 'B']).mode()
evaluates to ['A']
.
Here is a function that will return the mode (if there is only one) and "x" if there is a tie (i.e., multiple modes).
import pandas as pd
lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]
df = pd.DataFrame(lst, columns=["Class", "Grade"])
def get_mode_or_x(series):
mode = series.mode()
if mode.size == 1:
return mode[0]
return "x"
df.loc[:, "majority_vote"] = df.groupby("Class")["Grade"].transform(get_mode_or_x)
index
Class
Grade
majority_vote
0
High
A
A
1
High
A
A
2
High
B
A
3
Medium
A
x
4
Medium
B
x
5
Medium
C
x
I have a dataframe like:
lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]
df = pd.DataFrame(lst, columns =["Class", "Grade"])
I need to get the mode (majority vote) of "Grade" in each "Class".
If it’s a tie vote, assign "x".
Below is what I expect to get:
Class | Grade | Majority_vote |
---|---|---|
High | A | A |
High | A | A |
High | B | A |
Medium | A | x |
Medium | B | x |
Medium | C | x |
This is my code:
df['majority_vote'] = df.groupby(['Class'])['Grade'].transform(lambda x: x.mode()[0])
I think the code will return ‘nan’ if it’s a tie vote. Then, I will change ‘nan’ to ‘x’ later.
However, what I get is below:
Class | Grade | Majority_vote |
---|---|---|
High | A | A |
High | A | A |
High | B | A |
Medium | A | A |
Medium | B | A |
Medium | C | A |
At class "Medium", the code returns the 1st element ("A") instead of ‘nan’.
Any other method is appreciated.
Could you please help me? Thank you in advance.
The issue with using x.mode()[0]
is that pd.Series(['A', 'B', 'C']).mode()
evaluates to ['A', 'B', 'C']
. Meanwhile, pd.Series(['A', 'A', 'B']).mode()
evaluates to ['A']
.
Here is a function that will return the mode (if there is only one) and "x" if there is a tie (i.e., multiple modes).
import pandas as pd
lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]
df = pd.DataFrame(lst, columns=["Class", "Grade"])
def get_mode_or_x(series):
mode = series.mode()
if mode.size == 1:
return mode[0]
return "x"
df.loc[:, "majority_vote"] = df.groupby("Class")["Grade"].transform(get_mode_or_x)
index | Class | Grade | majority_vote |
---|---|---|---|
0 | High | A | A |
1 | High | A | A |
2 | High | B | A |
3 | Medium | A | x |
4 | Medium | B | x |
5 | Medium | C | x |