Sort pandas DataFrame rows by a list of (index) numbers

Question:

I have a pandas DataFrame with 229 rows. I have a list of index numbers ([47, 16, 59, ...]) and I want to re-sort the rows of my DataFrame into this order.

Details: I ran the DF through a filter (specifically, scipy.cluster.hierarchy.dendrogram, setting get_leaves=True). The return value includes a list of index numbers (leaves) in order of the dendrogram leaf nodes. I now want to sort my DF in that order, so that I can plot the clusters.

I’m sure there are many ways that I can merge a bunch of tables and drop columns but… is there a nice idiomatic way to do this?

Asked By: Vicki B

||

Answers:

if the list is the same shape as df then just paste it in like so and sort by newly created column

df['List']=ListOfIndices
df.sort_values(by=['List'])
Answered By: uxke

Creating a new column, mapping your indexes to the correct rows and then performing a sort should be the easiest way to do this.

I created some dummy data to provide an example;

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

df
     A   B   C   D
0    8  27   2   9
1   87  17  82  61
2   20  65  42  87
3    6  60  99  22
4    1  54  57  32

indices = [random.randrange(99) for i in range(99)]
#[54, 37, 83, 25, 44, 68, 81, 72, 61, 74, 10, 75, 24, 77, 89, 6, 59, 95, 44, 20, 72, 0, 53, 6, 61, 17, 52, 7, 95, 4, 64, 15, 46, 18, 58, 30, 3, 7, 94, 30, 93, 78, 24, 98, 65, 63, 79, 1, 43, 17, 76, 65, 85, 88, 66, 86, 10, 96, 27, 85, 66, 48, 2, 83, 25, 11, 88, 41, 88, 10, 15, 19, 75, 6, 72, 39, 28, 18, 78, 22, 71, 28, 97, 76, 47, 38, 9, 91, 69, 27, 63, 43, 19, 38, 80, 16, 35, 20, 65]

Code:

df['NewIndex'] = None # Create new column, with only None values

for indx, value in enumerate(indices):
    df['NewIndex'][value] = indx # Set index (List element number) to indx (Order in indices list)

df = df.sort_values(by=['NewIndex']) # Sort by new column

Output:

         A   B   C   D NewIndex
54  69  73  81  31        0
37  54  97  45  31        1
68  27  56  86  50        5
81  60   8  20  29        6
74  95  54  45  59        9
..  ..  ..  ..  ..      ...
84   9  67  88  38     None
87  47   9  97   2     None
90  38   6  98  50     None
92  57  51  84  37     None
99  12  88  38  90     None

Note, the nones and the missing rows will be due to mismatches in the indices and dataframe indexes. I did not take the time to ensure indices had 1-99 and no duplicates.

Answered By: PacketLoss
df.loc[ListOfIndices]

And if you want to reset the indeces:

df.loc[ListOfIndices].reset_index(drop=True)
Answered By: maxi.marufo
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.