how to generate new table from existing table by grouping them aggregated MODE values
Question:
I’ve a data frame in pandas, and I’m trying to generate a new table based on existing table by grouping them with their aggregated mode value.
df
country scores attempts
india 11 6
india 12 3
india 12 3
india 12 7
india 10 3
india 12 3
pakistan 10 4
Pakistan 14 4
pakistan 14 5
srilanka 23 5
srilanka 21 5
srilanka 21 6
srilanka 23 5
srilanka 23 6
srilanka 23 5
Result will be like this
country scores attempts
0 India 12 3
1 Pakistan 14 4
2 srilanka 23 5
please help me solve this issue.
Answers:
Use GroupBy.size
first and then get first modes by DataFrameGroupBy.idxmax
for indice by maximal counts:
print (df)
country scores attempts
0 india 11 6
1 india 12 3
2 india 12 3
3 india 12 7
4 india 10 3
5 india 12 3
6 pakistan 10 4
7 pakistan 14 4
8 pakistan 14 4 <- correct data
9 srilanka 23 5
10 srilanka 21 5
11 srilanka 21 6
12 srilanka 23 5
13 srilanka 23 6
14 srilanka 23 5
df1 = df.groupby(['country','scores','attempts']).size().reset_index(name='count')
print (df1)
country scores attempts count
0 india 10 3 1
1 india 11 6 1
2 india 12 3 3
3 india 12 7 1
4 pakistan 10 4 1
5 pakistan 14 4 2
6 srilanka 21 5 1
7 srilanka 21 6 1
8 srilanka 23 5 3
9 srilanka 23 6 1
df2 = df1.loc[df1.groupby('country')['count'].idxmax()].drop('count', axis=1).reset_index(drop=True)
print (df2)
country scores attempts
0 india 12 3
1 pakistan 14 4
2 srilanka 23 5
I’ve a data frame in pandas, and I’m trying to generate a new table based on existing table by grouping them with their aggregated mode value.
df
country scores attempts
india 11 6
india 12 3
india 12 3
india 12 7
india 10 3
india 12 3
pakistan 10 4
Pakistan 14 4
pakistan 14 5
srilanka 23 5
srilanka 21 5
srilanka 21 6
srilanka 23 5
srilanka 23 6
srilanka 23 5
Result will be like this
country scores attempts
0 India 12 3
1 Pakistan 14 4
2 srilanka 23 5
please help me solve this issue.
Use GroupBy.size
first and then get first modes by DataFrameGroupBy.idxmax
for indice by maximal counts:
print (df)
country scores attempts
0 india 11 6
1 india 12 3
2 india 12 3
3 india 12 7
4 india 10 3
5 india 12 3
6 pakistan 10 4
7 pakistan 14 4
8 pakistan 14 4 <- correct data
9 srilanka 23 5
10 srilanka 21 5
11 srilanka 21 6
12 srilanka 23 5
13 srilanka 23 6
14 srilanka 23 5
df1 = df.groupby(['country','scores','attempts']).size().reset_index(name='count')
print (df1)
country scores attempts count
0 india 10 3 1
1 india 11 6 1
2 india 12 3 3
3 india 12 7 1
4 pakistan 10 4 1
5 pakistan 14 4 2
6 srilanka 21 5 1
7 srilanka 21 6 1
8 srilanka 23 5 3
9 srilanka 23 6 1
df2 = df1.loc[df1.groupby('country')['count'].idxmax()].drop('count', axis=1).reset_index(drop=True)
print (df2)
country scores attempts
0 india 12 3
1 pakistan 14 4
2 srilanka 23 5