Python Pandas to group columns only
Question:
A simple data-frame as below on the left and I want to achieve the right:
I use:
import pandas as pd
data = {'name': ['Jason', 'Molly', 'Tina', 'Jason', 'Amy', 'Jason', 'River', 'Kate', 'David', 'Jack', 'David'],
'Department' : ['Sales', 'Operation', 'Operation', 'Sales', 'Operation', 'Sales', 'Operation', 'Sales', 'Finance', 'Finance', 'Finance'],
'Weight lost': [4, 4, 1, 4, 4, 4, 7, 2, 8, 1, 8],
'Point earned': [2, 2, 1, 2, 2, 2, 4, 1, 4, 1, 4]}
df = pd.DataFrame(data)
final = df.pivot_table(index=['Department','name'], values='Weight lost', aggfunc='count', fill_value=0).stack(dropna=False).reset_index(name='Weight_lost_count')
del final['level_2']
del final['Weight_lost_count']
print (final)
It seems non-necessary steps in the ‘final’ line.
What would be the better way to write it?
Answers:
Isn’t this just drop_duplicates
:
df[['Department','name']].drop_duplicates()
Output:
Department name
0 Sales Jason
1 Operation Molly
2 Operation Tina
4 Operation Amy
6 Operation River
7 Sales Kate
8 Finance David
9 Finance Jack
And to exactly match the final
:
(df[['Department','name']].drop_duplicates()
.sort_values(by=['Department','name'])
)
Output:
Department name
8 Finance David
9 Finance Jack
4 Operation Amy
1 Operation Molly
6 Operation River
2 Operation Tina
0 Sales Jason
7 Sales Kate
Try groupby
with head
out = df.groupby(['Department','name']).head(1)
A simple data-frame as below on the left and I want to achieve the right:
I use:
import pandas as pd
data = {'name': ['Jason', 'Molly', 'Tina', 'Jason', 'Amy', 'Jason', 'River', 'Kate', 'David', 'Jack', 'David'],
'Department' : ['Sales', 'Operation', 'Operation', 'Sales', 'Operation', 'Sales', 'Operation', 'Sales', 'Finance', 'Finance', 'Finance'],
'Weight lost': [4, 4, 1, 4, 4, 4, 7, 2, 8, 1, 8],
'Point earned': [2, 2, 1, 2, 2, 2, 4, 1, 4, 1, 4]}
df = pd.DataFrame(data)
final = df.pivot_table(index=['Department','name'], values='Weight lost', aggfunc='count', fill_value=0).stack(dropna=False).reset_index(name='Weight_lost_count')
del final['level_2']
del final['Weight_lost_count']
print (final)
It seems non-necessary steps in the ‘final’ line.
What would be the better way to write it?
Isn’t this just drop_duplicates
:
df[['Department','name']].drop_duplicates()
Output:
Department name
0 Sales Jason
1 Operation Molly
2 Operation Tina
4 Operation Amy
6 Operation River
7 Sales Kate
8 Finance David
9 Finance Jack
And to exactly match the final
:
(df[['Department','name']].drop_duplicates()
.sort_values(by=['Department','name'])
)
Output:
Department name
8 Finance David
9 Finance Jack
4 Operation Amy
1 Operation Molly
6 Operation River
2 Operation Tina
0 Sales Jason
7 Sales Kate
Try groupby
with head
out = df.groupby(['Department','name']).head(1)