Python Pandas to group columns only

Question

A simple data-frame as below on the left and I want to achieve the right:

I use:

import pandas as pd

data = {'name': ['Jason', 'Molly', 'Tina', 'Jason', 'Amy', 'Jason', 'River', 'Kate', 'David', 'Jack', 'David'], 
'Department' : ['Sales', 'Operation', 'Operation', 'Sales', 'Operation', 'Sales', 'Operation', 'Sales', 'Finance', 'Finance', 'Finance'],
'Weight lost': [4, 4, 1, 4, 4, 4, 7, 2, 8, 1, 8],
'Point earned': [2, 2, 1, 2, 2, 2, 4, 1, 4, 1, 4]}

df = pd.DataFrame(data)

final = df.pivot_table(index=['Department','name'], values='Weight lost', aggfunc='count', fill_value=0).stack(dropna=False).reset_index(name='Weight_lost_count')

del final['level_2']
del final['Weight_lost_count']

print (final)

It seems non-necessary steps in the ‘final’ line.

What would be the better way to write it?

Asked By: Mark K

||

Source

Answer 1

Isn’t this just drop_duplicates:

df[['Department','name']].drop_duplicates()

Output:

  Department   name
0      Sales  Jason
1  Operation  Molly
2  Operation   Tina
4  Operation    Amy
6  Operation  River
7      Sales   Kate
8    Finance  David
9    Finance   Jack

And to exactly match the final:

(df[['Department','name']].drop_duplicates()
   .sort_values(by=['Department','name'])
)

Output:

  Department   name
8    Finance  David
9    Finance   Jack
4  Operation    Amy
1  Operation  Molly
6  Operation  River
2  Operation   Tina
0      Sales  Jason
7      Sales   Kate

Answered By: Quang Hoang

Answer 2

Try groupby with head

out = df.groupby(['Department','name']).head(1)

Answered By: BENY

Python Pandas to group columns only

Question:

Answers: