I am stuck in writing one code. I wanted to sort one column first by integer then combination of integer and string then string

Question:

I wanted to sort one column first by integer then combination of integer and string then string.

Below is my data frame.

import pandas as pd

df = pd.DataFrame({'name': ['aaa', 'ccc', 'bbb', '1c2z45zz', 55, 3, 'et45']})

Below is my expected output.


df = pd.DataFrame({'name': [3, 55, '1c2z45zz', 'et45', 'aaa', 'bbb', 'ccc']})

I have tried many method but all gives first combination of string and integer then integer then string.

import natsort

df.iloc[natsort.index_humansorted(df.name)]

then I tried below one.

max_length = max(df.name.str.len())
df['sort_name']=df.name.str.pad(max_length,'right','~')
df.sort_values('sort_name', inplace=True, ignore_index=True)

and the last I tried.

df_sorted = df.astype(str).sort_values(by='A',key=lambda x: (x.isdigit(), x))
Asked By: user21510116

||

Answers:

You can use numpy.lexsort with your different conditions:

import natsort

# first sort by pure number
m1 = ~df['name'].astype(str).str.isdigit()
# then by strings having a number
m2 = ~df['name'].astype(str).str.contains(r'd')
# optional: sort remaining in human sorted order
m3 = np.argsort(natsort.index_humansorted(df['name']))

out = df.iloc[np.lexsort([m3, m2, m1])]

print(out)

Output:

       name
5         3
4        55
3  1c2z45zz
6      et45
0       aaa
2       bbb
1       ccc

Other example:

        name
6          1
7          2
5          3
8          3
9          4
4          5
10         5
11         6
12      1000
3   1c2z45zz
14      et45
0        aaa
2        bbb
1        ccc
13     p_o_j
Answered By: mozway

Here is an option using key in sort_values

    (df.astype(str)
.sort_values('name')
.sort_values('name',key = lambda x: list(zip(x.str.isdigit(),x.str.contains(r'd'))),ascending=False))

Output:

       name
5         3
4        55
3  1c2z45zz
6      et45
0       aaa
2       bbb
1       ccc
Answered By: rhug123