Sort pandas DataFrame rows by a list of (index) numbers
Question:
I have a pandas DataFrame with 229 rows. I have a list of index numbers ([47, 16, 59, ...]
) and I want to re-sort the rows of my DataFrame into this order.
Details: I ran the DF through a filter (specifically, scipy.cluster.hierarchy.dendrogram, setting get_leaves=True
). The return value includes a list of index numbers (leaves
) in order of the dendrogram leaf nodes. I now want to sort my DF in that order, so that I can plot the clusters.
I’m sure there are many ways that I can merge a bunch of tables and drop columns but… is there a nice idiomatic way to do this?
Answers:
if the list is the same shape as df then just paste it in like so and sort by newly created column
df['List']=ListOfIndices
df.sort_values(by=['List'])
Creating a new column, mapping your indexes to the correct rows and then performing a sort should be the easiest way to do this.
I created some dummy data to provide an example;
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df
A B C D
0 8 27 2 9
1 87 17 82 61
2 20 65 42 87
3 6 60 99 22
4 1 54 57 32
indices = [random.randrange(99) for i in range(99)]
#[54, 37, 83, 25, 44, 68, 81, 72, 61, 74, 10, 75, 24, 77, 89, 6, 59, 95, 44, 20, 72, 0, 53, 6, 61, 17, 52, 7, 95, 4, 64, 15, 46, 18, 58, 30, 3, 7, 94, 30, 93, 78, 24, 98, 65, 63, 79, 1, 43, 17, 76, 65, 85, 88, 66, 86, 10, 96, 27, 85, 66, 48, 2, 83, 25, 11, 88, 41, 88, 10, 15, 19, 75, 6, 72, 39, 28, 18, 78, 22, 71, 28, 97, 76, 47, 38, 9, 91, 69, 27, 63, 43, 19, 38, 80, 16, 35, 20, 65]
Code:
df['NewIndex'] = None # Create new column, with only None values
for indx, value in enumerate(indices):
df['NewIndex'][value] = indx # Set index (List element number) to indx (Order in indices list)
df = df.sort_values(by=['NewIndex']) # Sort by new column
Output:
A B C D NewIndex
54 69 73 81 31 0
37 54 97 45 31 1
68 27 56 86 50 5
81 60 8 20 29 6
74 95 54 45 59 9
.. .. .. .. .. ...
84 9 67 88 38 None
87 47 9 97 2 None
90 38 6 98 50 None
92 57 51 84 37 None
99 12 88 38 90 None
Note, the nones and the missing rows will be due to mismatches in the indices and dataframe indexes. I did not take the time to ensure indices had 1-99 and no duplicates.
df.loc[ListOfIndices]
And if you want to reset the indeces:
df.loc[ListOfIndices].reset_index(drop=True)
I have a pandas DataFrame with 229 rows. I have a list of index numbers ([47, 16, 59, ...]
) and I want to re-sort the rows of my DataFrame into this order.
Details: I ran the DF through a filter (specifically, scipy.cluster.hierarchy.dendrogram, setting get_leaves=True
). The return value includes a list of index numbers (leaves
) in order of the dendrogram leaf nodes. I now want to sort my DF in that order, so that I can plot the clusters.
I’m sure there are many ways that I can merge a bunch of tables and drop columns but… is there a nice idiomatic way to do this?
if the list is the same shape as df then just paste it in like so and sort by newly created column
df['List']=ListOfIndices
df.sort_values(by=['List'])
Creating a new column, mapping your indexes to the correct rows and then performing a sort should be the easiest way to do this.
I created some dummy data to provide an example;
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df
A B C D
0 8 27 2 9
1 87 17 82 61
2 20 65 42 87
3 6 60 99 22
4 1 54 57 32
indices = [random.randrange(99) for i in range(99)]
#[54, 37, 83, 25, 44, 68, 81, 72, 61, 74, 10, 75, 24, 77, 89, 6, 59, 95, 44, 20, 72, 0, 53, 6, 61, 17, 52, 7, 95, 4, 64, 15, 46, 18, 58, 30, 3, 7, 94, 30, 93, 78, 24, 98, 65, 63, 79, 1, 43, 17, 76, 65, 85, 88, 66, 86, 10, 96, 27, 85, 66, 48, 2, 83, 25, 11, 88, 41, 88, 10, 15, 19, 75, 6, 72, 39, 28, 18, 78, 22, 71, 28, 97, 76, 47, 38, 9, 91, 69, 27, 63, 43, 19, 38, 80, 16, 35, 20, 65]
Code:
df['NewIndex'] = None # Create new column, with only None values
for indx, value in enumerate(indices):
df['NewIndex'][value] = indx # Set index (List element number) to indx (Order in indices list)
df = df.sort_values(by=['NewIndex']) # Sort by new column
Output:
A B C D NewIndex
54 69 73 81 31 0
37 54 97 45 31 1
68 27 56 86 50 5
81 60 8 20 29 6
74 95 54 45 59 9
.. .. .. .. .. ...
84 9 67 88 38 None
87 47 9 97 2 None
90 38 6 98 50 None
92 57 51 84 37 None
99 12 88 38 90 None
Note, the nones and the missing rows will be due to mismatches in the indices and dataframe indexes. I did not take the time to ensure indices had 1-99 and no duplicates.
df.loc[ListOfIndices]
And if you want to reset the indeces:
df.loc[ListOfIndices].reset_index(drop=True)