dataframe: column statistics after grouping
Question:
I know this question has been asked, but I can’t the answer worked for me, so I ask.
Having this dataframe:
data = pd.DataFrame({'id': [9,9,9,2,2,7,7,7,5,8,8,4,4,3,3,3,
1,1,1,1,1,6,6,6,6,6,10,11,11,11,11],
'signal': [1,3,5,7,9,13,17,27,5,1,5,5,11,3,7,11,6,
8,12,14,18,1,3,5,111,115,57,9,21,45,51],
'mode': [0,0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,
2,2,3,3,3,3,3,3,3,3,3,3]
})
data.head()
id signal mode
0 9 1 0
1 9 3 0
2 9 5 0
3 2 7 0
4 2 9 0
I want to get the distribution of signal
by mode
.
I know I need to do the grouping this way:
data.groupby('signal')['mode'].value_counts()
But then I don’t know how to proceed, to arrive at:
mode total
0 8
1 5
2 8
3 10
Answers:
Simple use value_counts
on mode
columns:
>>> df['mode'].value_counts()
3 10
0 8
2 8
1 5
Name: mode, dtype: int64
# Fully output
>>> (df['mode'].value_counts().rename('total').rename_axis('mode')
.sort_index().reset_index())
mode total
0 0 8
1 1 5
2 2 8
3 3 10
But you can also want:
>>> df.groupby('mode', as_index=False)['signal'].nunique()
mode signal
0 0 8
1 1 3
2 2 8
3 3 10
I know this question has been asked, but I can’t the answer worked for me, so I ask.
Having this dataframe:
data = pd.DataFrame({'id': [9,9,9,2,2,7,7,7,5,8,8,4,4,3,3,3,
1,1,1,1,1,6,6,6,6,6,10,11,11,11,11],
'signal': [1,3,5,7,9,13,17,27,5,1,5,5,11,3,7,11,6,
8,12,14,18,1,3,5,111,115,57,9,21,45,51],
'mode': [0,0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,
2,2,3,3,3,3,3,3,3,3,3,3]
})
data.head()
id signal mode
0 9 1 0
1 9 3 0
2 9 5 0
3 2 7 0
4 2 9 0
I want to get the distribution of signal
by mode
.
I know I need to do the grouping this way:
data.groupby('signal')['mode'].value_counts()
But then I don’t know how to proceed, to arrive at:
mode total
0 8
1 5
2 8
3 10
Simple use value_counts
on mode
columns:
>>> df['mode'].value_counts()
3 10
0 8
2 8
1 5
Name: mode, dtype: int64
# Fully output
>>> (df['mode'].value_counts().rename('total').rename_axis('mode')
.sort_index().reset_index())
mode total
0 0 8
1 1 5
2 2 8
3 3 10
But you can also want:
>>> df.groupby('mode', as_index=False)['signal'].nunique()
mode signal
0 0 8
1 1 3
2 2 8
3 3 10