How to efficiently reorder rows based on condition?

Question

My dataframe:

df = pd.DataFrame({'col_1': [10, 20, 10, 20, 10, 10, 20, 20],
                   'col_2': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']})

    col_1   col_2
0   10      a
1   20      b
2   10      c
3   20      d
4   10      e
5   10      f
6   20      g
7   20      h

I don’t want consecutive rows with col_1 = 10, instead a row below a repeating 10 should jump up by one (in this case, index 6 should become index 5 and vice versa), so the order is always 10, 20, 10, 20…

My current solution:

for idx, row in df.iterrows():
    if row['col_1'] == 10 and df.iloc[idx + 1]['col_1'] != 20:
        df = df.rename({idx + 1:idx + 2, idx + 2: idx + 1})

df = df.sort_index()
df

gives me:

    col_1   col_2
0   10      a
1   20      b
2   10      c
3   20      d
4   10      e
5   20      g
6   10      f
7   20      h

which is what I want but it is very slow (2.34s for a dataframe with just over 8000 rows).
Is there a way to avoid loop here?
Thanks

Asked By: sierra_papa

||

Source

Answer 1

You can use a custom key in sort_values with groupby.cumcount:

df.sort_values(by='col_1', kind='stable', key=lambda s: df.groupby(s).cumcount())

Output:

   col_1 col_2
0     10     a
1     20     b
2     10     c
3     20     d
4     10     e
6     20     g
5     10     f
7     20     h

Answered By: mozway

How to efficiently reorder rows based on condition?

Question:

Answers: