pandas groupby dropping columns
Question:
I’m doing a simple group by operation, trying to compare group means. As you can see below, I have selected specific columns from a larger dataframe, from which all missing values have been removed.
But when I group by, I am losing a couple of columns:
I have never encountered this with pandas, and I’m not finding anything else on stack overflow that is all that similar. Does anybody have any insight?
Answers:
I think it is Automatic exclusion of 'nuisance' columns
, what described here.
Sample:
df = pd.DataFrame({'C': {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, 7: 1.548106}, 'A': {0: 'foo', 1: 'bar', 2: 'foo', 3: 'bar', 4: 'foo', 5: 'bar', 6: 'foo', 7: 'foo'}, 'B': {0: 'one', 1: 'one', 2: 'two', 3: 'three', 4: 'two', 5: 'two', 6: 'one', 7: 'three'}, 'D': {0: -1.131345, 1: -0.089328999999999992, 2: 0.33786300000000002, 3: -0.94586700000000001, 4: -0.93213199999999996, 5: 1.9560299999999999, 6: 0.017587000000000002, 7: -0.016691999999999999}})
print (df)
A B C D
0 foo one -0.919854 -1.131345
1 bar one -0.042379 -0.089329
2 foo two 1.247642 0.337863
3 bar three -0.009920 -0.945867
4 foo two 0.290213 -0.932132
5 bar two 0.495767 1.956030
6 foo one 0.362949 0.017587
7 foo three 1.548106 -0.016692
print( df.groupby('A').mean())
C D
A
bar 0.147823 0.306945
foo 0.505811 -0.344944
I think you can check DataFrame.dtypes
.
Try df.groupby(['col_1', 'col_2'], as_index=False).mean()
.
Use as_index=False
to retain column names. Default is True. Above comments have answered this question but posting it as an answer.
Make sure your column is in numeric/int format and not e.g. as ‘O’ as Object format.
This was one reason it was disapearing for me.
You can check the format of the column by hte code below:
df.column.dtypes
I’m doing a simple group by operation, trying to compare group means. As you can see below, I have selected specific columns from a larger dataframe, from which all missing values have been removed.
But when I group by, I am losing a couple of columns:
I have never encountered this with pandas, and I’m not finding anything else on stack overflow that is all that similar. Does anybody have any insight?
I think it is Automatic exclusion of 'nuisance' columns
, what described here.
Sample:
df = pd.DataFrame({'C': {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, 7: 1.548106}, 'A': {0: 'foo', 1: 'bar', 2: 'foo', 3: 'bar', 4: 'foo', 5: 'bar', 6: 'foo', 7: 'foo'}, 'B': {0: 'one', 1: 'one', 2: 'two', 3: 'three', 4: 'two', 5: 'two', 6: 'one', 7: 'three'}, 'D': {0: -1.131345, 1: -0.089328999999999992, 2: 0.33786300000000002, 3: -0.94586700000000001, 4: -0.93213199999999996, 5: 1.9560299999999999, 6: 0.017587000000000002, 7: -0.016691999999999999}})
print (df)
A B C D
0 foo one -0.919854 -1.131345
1 bar one -0.042379 -0.089329
2 foo two 1.247642 0.337863
3 bar three -0.009920 -0.945867
4 foo two 0.290213 -0.932132
5 bar two 0.495767 1.956030
6 foo one 0.362949 0.017587
7 foo three 1.548106 -0.016692
print( df.groupby('A').mean())
C D
A
bar 0.147823 0.306945
foo 0.505811 -0.344944
I think you can check DataFrame.dtypes
.
Try df.groupby(['col_1', 'col_2'], as_index=False).mean()
.
Use as_index=False
to retain column names. Default is True. Above comments have answered this question but posting it as an answer.
Make sure your column is in numeric/int format and not e.g. as ‘O’ as Object format.
This was one reason it was disapearing for me.
You can check the format of the column by hte code below:
df.column.dtypes