Rearrange the columns of a pandas DataFrame based on row number
Question:
There is a dataframe:
import pandas as pd
df = pd.DataFrame(data = {'a':[3,0,2,1],'b':[4,3,2,1],'c':[3,2,1,0],'d':[4,3,2,0]})
print(df)
>>> df
a b c d
0 3 4 3 4
1 0 3 2 3
2 2 2 1 2
3 1 1 0 0
How to rearrange(sort?) entire df based on the distribution of numbers in the cells in each column? for example, the content of a column is [3,2,1], then it should be arranged before other columns that elements is [3,2,0]. When comparing two columns, the numbers in each row are compared in sequence. If the first row is the same, the next row is compared.
The desired results are as follows:
>>> dd
b d c a
0 4 4 3 3
1 3 3 2 0
2 2 2 1 2
3 1 0 0 1
Answers:
Code:
import pandas as pd
df = pd.DataFrame(data={'a': [3, 0, 2, 1], 'b': [4, 3, 2, 1], 'c': [3, 2, 1, 0], 'd': [4, 3, 2, 0]})
def custom_sort(col):
return tuple(col.values)
sorted_columns = sorted(df.columns, key=lambda col: custom_sort(df[col]), reverse=True)
df_sorted = df[sorted_columns]
print(df_sorted)
Output:
I also found a solution by weighting the elements of each column.
>>> df
a b c d
0 3 4 3 4
1 0 3 2 3
2 2 2 1 2
3 1 1 0 0
df[df.apply(lambda x:sum(x * pow(10, np.arange(len(x))[::-1]))).sort_values(ascending=False).index]
b d c a
0 4 4 3 3
1 3 3 2 0
2 2 2 1 2
3 1 0 0 1
For an efficient solution, don’t reinvent the wheel, use sort_values
directly:
out = df.sort_values(by=list(df.index), axis=1, ascending=False)
Or eventually numpy.lexsort
:
out = df.iloc[:, np.lexsort(-df[::-1].to_numpy())]
Output:
b d c a
0 4 4 3 3
1 3 3 2 0
2 2 2 1 2
3 1 0 0 1
There is a dataframe:
import pandas as pd
df = pd.DataFrame(data = {'a':[3,0,2,1],'b':[4,3,2,1],'c':[3,2,1,0],'d':[4,3,2,0]})
print(df)
>>> df
a b c d
0 3 4 3 4
1 0 3 2 3
2 2 2 1 2
3 1 1 0 0
How to rearrange(sort?) entire df based on the distribution of numbers in the cells in each column? for example, the content of a column is [3,2,1], then it should be arranged before other columns that elements is [3,2,0]. When comparing two columns, the numbers in each row are compared in sequence. If the first row is the same, the next row is compared.
The desired results are as follows:
>>> dd
b d c a
0 4 4 3 3
1 3 3 2 0
2 2 2 1 2
3 1 0 0 1
Code:
import pandas as pd
df = pd.DataFrame(data={'a': [3, 0, 2, 1], 'b': [4, 3, 2, 1], 'c': [3, 2, 1, 0], 'd': [4, 3, 2, 0]})
def custom_sort(col):
return tuple(col.values)
sorted_columns = sorted(df.columns, key=lambda col: custom_sort(df[col]), reverse=True)
df_sorted = df[sorted_columns]
print(df_sorted)
Output:
I also found a solution by weighting the elements of each column.
>>> df
a b c d
0 3 4 3 4
1 0 3 2 3
2 2 2 1 2
3 1 1 0 0
df[df.apply(lambda x:sum(x * pow(10, np.arange(len(x))[::-1]))).sort_values(ascending=False).index]
b d c a
0 4 4 3 3
1 3 3 2 0
2 2 2 1 2
3 1 0 0 1
For an efficient solution, don’t reinvent the wheel, use sort_values
directly:
out = df.sort_values(by=list(df.index), axis=1, ascending=False)
Or eventually numpy.lexsort
:
out = df.iloc[:, np.lexsort(-df[::-1].to_numpy())]
Output:
b d c a
0 4 4 3 3
1 3 3 2 0
2 2 2 1 2
3 1 0 0 1