Pandas: Sorting columns by their mean value

Question:

I have a dataframe in Pandas, I would like to sort its columns (i.e. get a new dataframe, or a view) according to the mean value of its columns (or e.g. by their std value). The documentation talks about sorting by label or value, but I could not find anything on custom sorting methods.

How can I do this?

Answers:

You can use the mean DataFrame method and the Series sort_values method:

In [11]: df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD'))

In [12]: df
Out[12]:
          A         B         C         D
0  0.933069  1.432486  0.288637 -1.867853
1 -0.455952 -0.725268  0.339908  1.318175
2 -0.894331  0.573868  1.116137  0.508845
3  0.661572  0.819360 -0.527327 -0.925478

In [13]: df.mean()
Out[13]:
A    0.061089
B    0.525112
C    0.304339
D   -0.241578
dtype: float64

In [14]: df.mean().sort_values()
Out[14]:
D   -0.241578
A    0.061089
C    0.304339
B    0.525112
dtype: float64

Then you can reorder the columns using reindex:

In [15]: df.reindex(df.mean().sort_values().index, axis=1)
Out[15]:
          D         A         C         B
0 -1.867853  0.933069  0.288637  1.432486
1  1.318175 -0.455952  0.339908 -0.725268
2  0.508845 -0.894331  1.116137  0.573868
3 -0.925478  0.661572 -0.527327  0.819360

Note: In earlier versions of pandas, sort_values used to be order, but order was deprecated as part of 0.17 so to be more consistent with the other sorting methods. Also, in earlier versions, one had to use reindex_axis rather than reindex.

Answered By: Andy Hayden

You can use assign to create a variable, use it to sort values and drop it in the same line of code.

df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD'))
df.assign(m=df.mean(axis=1)).sort_values('m').drop('m', axis=1)
Answered By: Adriel M. Vieira
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.