Pandas: Sorting columns by their mean value

Question

I have a dataframe in Pandas, I would like to sort its columns (i.e. get a new dataframe, or a view) according to the mean value of its columns (or e.g. by their std value). The documentation talks about sorting by label or value, but I could not find anything on custom sorting methods.

How can I do this?

Asked By: Amelio Vazquez-Reina

||

Source

Answer 1

You can use the mean DataFrame method and the Series sort_values method:

In [11]: df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD'))

In [12]: df
Out[12]:
          A         B         C         D
0  0.933069  1.432486  0.288637 -1.867853
1 -0.455952 -0.725268  0.339908  1.318175
2 -0.894331  0.573868  1.116137  0.508845
3  0.661572  0.819360 -0.527327 -0.925478

In [13]: df.mean()
Out[13]:
A    0.061089
B    0.525112
C    0.304339
D   -0.241578
dtype: float64

In [14]: df.mean().sort_values()
Out[14]:
D   -0.241578
A    0.061089
C    0.304339
B    0.525112
dtype: float64

Then you can reorder the columns using reindex:

In [15]: df.reindex(df.mean().sort_values().index, axis=1)
Out[15]:
          D         A         C         B
0 -1.867853  0.933069  0.288637  1.432486
1  1.318175 -0.455952  0.339908 -0.725268
2  0.508845 -0.894331  1.116137  0.573868
3 -0.925478  0.661572 -0.527327  0.819360

Note: In earlier versions of pandas, sort_values used to be order, but order was deprecated as part of 0.17 so to be more consistent with the other sorting methods. Also, in earlier versions, one had to use reindex_axis rather than reindex.

Answered By: Andy Hayden

Answer 2

You can use assign to create a variable, use it to sort values and drop it in the same line of code.

df = pd.DataFrame(np.random.randn(4,4), columns=list('ABCD'))
df.assign(m=df.mean(axis=1)).sort_values('m').drop('m', axis=1)

Answered By: Adriel M. Vieira

Pandas: Sorting columns by their mean value

Question:

Answers: