I am stuck in writing one code. I wanted to sort one column first by integer then combination of integer and string then string
Question:
I wanted to sort one column first by integer then combination of integer and string then string.
Below is my data frame.
import pandas as pd
df = pd.DataFrame({'name': ['aaa', 'ccc', 'bbb', '1c2z45zz', 55, 3, 'et45']})
Below is my expected output.
df = pd.DataFrame({'name': [3, 55, '1c2z45zz', 'et45', 'aaa', 'bbb', 'ccc']})
I have tried many method but all gives first combination of string and integer then integer then string.
import natsort
df.iloc[natsort.index_humansorted(df.name)]
then I tried below one.
max_length = max(df.name.str.len())
df['sort_name']=df.name.str.pad(max_length,'right','~')
df.sort_values('sort_name', inplace=True, ignore_index=True)
and the last I tried.
df_sorted = df.astype(str).sort_values(by='A',key=lambda x: (x.isdigit(), x))
Answers:
You can use numpy.lexsort
with your different conditions:
import natsort
# first sort by pure number
m1 = ~df['name'].astype(str).str.isdigit()
# then by strings having a number
m2 = ~df['name'].astype(str).str.contains(r'd')
# optional: sort remaining in human sorted order
m3 = np.argsort(natsort.index_humansorted(df['name']))
out = df.iloc[np.lexsort([m3, m2, m1])]
print(out)
Output:
name
5 3
4 55
3 1c2z45zz
6 et45
0 aaa
2 bbb
1 ccc
Other example:
name
6 1
7 2
5 3
8 3
9 4
4 5
10 5
11 6
12 1000
3 1c2z45zz
14 et45
0 aaa
2 bbb
1 ccc
13 p_o_j
Here is an option using key in sort_values
(df.astype(str)
.sort_values('name')
.sort_values('name',key = lambda x: list(zip(x.str.isdigit(),x.str.contains(r'd'))),ascending=False))
Output:
name
5 3
4 55
3 1c2z45zz
6 et45
0 aaa
2 bbb
1 ccc
I wanted to sort one column first by integer then combination of integer and string then string.
Below is my data frame.
import pandas as pd
df = pd.DataFrame({'name': ['aaa', 'ccc', 'bbb', '1c2z45zz', 55, 3, 'et45']})
Below is my expected output.
df = pd.DataFrame({'name': [3, 55, '1c2z45zz', 'et45', 'aaa', 'bbb', 'ccc']})
I have tried many method but all gives first combination of string and integer then integer then string.
import natsort
df.iloc[natsort.index_humansorted(df.name)]
then I tried below one.
max_length = max(df.name.str.len())
df['sort_name']=df.name.str.pad(max_length,'right','~')
df.sort_values('sort_name', inplace=True, ignore_index=True)
and the last I tried.
df_sorted = df.astype(str).sort_values(by='A',key=lambda x: (x.isdigit(), x))
You can use numpy.lexsort
with your different conditions:
import natsort
# first sort by pure number
m1 = ~df['name'].astype(str).str.isdigit()
# then by strings having a number
m2 = ~df['name'].astype(str).str.contains(r'd')
# optional: sort remaining in human sorted order
m3 = np.argsort(natsort.index_humansorted(df['name']))
out = df.iloc[np.lexsort([m3, m2, m1])]
print(out)
Output:
name
5 3
4 55
3 1c2z45zz
6 et45
0 aaa
2 bbb
1 ccc
Other example:
name
6 1
7 2
5 3
8 3
9 4
4 5
10 5
11 6
12 1000
3 1c2z45zz
14 et45
0 aaa
2 bbb
1 ccc
13 p_o_j
Here is an option using key in sort_values
(df.astype(str)
.sort_values('name')
.sort_values('name',key = lambda x: list(zip(x.str.isdigit(),x.str.contains(r'd'))),ascending=False))
Output:
name
5 3
4 55
3 1c2z45zz
6 et45
0 aaa
2 bbb
1 ccc