How to group a dataframe with 2 columns and get highest count?

Question

I have the following dataframe (df). Col A is a category and Col B is an item in that category.

Col A	Col B
red	car
red	car
red	truck
red	ball
blue	bus
blue	bus
blue	bus
blue	truck
blue	car

I want to get another dataframe (df2) showing the total count of distinct categories in Col A, followed by the count highest occurring item in Col B corresponding to Col A as below:

Col A	Count A	Col B	Count B
red	4	car	2
blue	5	bus	2

Any idea on how to generate this dataframe?

I have tried this command:

df2 = df.groupby(‘Col A’)[‘Col B’].apply(lambda x: x.value_counts().index[0]).reset_index()

and I get the following result:

Col A	Col B
red	car
blue	bus

I don’t know how to get the two counts column. Any ideas?

Asked By: MueezK

||

Source

Answer 1

You can use Counter from Collections:

from collections import Counter
final = df.groupby(['Col A']).agg({'Col A':'count','Col B':list})

'''
       Col A                        Col B
Col A                                    
blue       5  [bus, bus, bus, truck, car]
red        4      [car, car, truck, ball]
'''

final['Col_b'] = final['Col B'].apply(lambda x: Counter(x).most_common(1)[0][0]) #get most common value
final['Count_b'] = final['Col B'].apply(lambda x: Counter(x).most_common(1)[0][1]) #get count of most common value
final=final.drop('Col B',axis=1).rename(columns={'Col A':'Count A'}).reset_index()

Output:

|    | Col A   |   Count A | Col_b   |   Count_b |
|---:|:--------|----------:|:--------|----------:|
|  0 | blue    |         5 | bus     |         3 |
|  1 | red     |         4 | car     |         2 |

Answered By: Bushmaster

How to group a dataframe with 2 columns and get highest count?

Question:

Answers: