Is there a way to sort a data frame with Id's [A1, A2, A3, …, H12] to [A1, B1, C1, …, H12]?

Question:

Trying to find a way to sort [A1, A2, A3, …, H12] to [A1, B1, C1, …, H12] in a data frame.

Have tried this so far:

def key(row):
    match = re.match(r'(d*)([A-H]d+)', row)
    if match:
        num, letters = match.groups()
        return letters, int(num) if num else 0
    return row 

df['Id'] = sorted(df['Id'], key=key)

But it is not sorting it correctly.

Sample Data Frame Column:
Id
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
B1
B2
B3
B4
B5
B6
B7
B8
B9
B10
B11
B12
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
F11
F12
G1
G2
G3
G4
G5
G6
G7
G8
G9
G10
G11
G12
H1
H2
H3
H4
H5
H6
H7
H8
H9
H10
H11
H12

Expected Output: Id
A1
B1
C1
D1
E1
F1
G1
H1
A2
B2
C2
D2
E2
F2
G2
H2
A3
B3
C3
D3
E3
F3
G3
H3
A4
B4
C4
D4
E4
F4
G4
H4
A5
B5
C5
D5
E5
F5
G5
H5
A6
B6
C6
D6
E6
F6
G6
H6
A7
B7
C7
D7
E7
F7
G7
H7
A8
B8
C8
D8
E8
F8
G8
H8
A9
B9
C9
D9
E9
F9
G9
H9
A10
B10
C10
D10
E10
F10
G10
H10
A11
B11
C11
D11
E11
F11
G11
H11
A12
B12
C12
D12
E12
F12
G12
H12

Asked By: zmactaggart

||

Answers:

Your regex is not correct, and you need to sort first by the numbers, then by the letters:

import re

def key(row):
    match = re.match(r'([A-H])(d+)', row)
    if match:
        letters, num = match.groups()
        return int(num),letters
    return row 

dfId = [let+str(num) for let in 'ABCDEFGH' for num in range(1,13)]
print(dfId)
print()
dfId = sorted(dfId, key=key)
print(dfId)

Output:

['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10', 'A11', 'A12', 'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11', 'B12', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10', 'C11', 'C12', 'D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 'D9', 'D10', 'D11', 'D12', 'E1', 'E2', 'E3', 'E4', 'E5', 'E6', 'E7', 'E8', 'E9', 'E10', 'E11', 'E12', 'F1', 'F2', 'F3', 'F4', 'F5', 'F6', 'F7', 'F8', 'F9', 'F10', 'F11', 'F12', 'G1', 'G2', 'G3', 'G4', 'G5', 'G6', 'G7', 'G8', 'G9', 'G10', 'G11', 'G12', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8', 'H9', 'H10', 'H11', 'H12']

['A1', 'B1', 'C1', 'D1', 'E1', 'F1', 'G1', 'H1', 'A2', 'B2', 'C2', 'D2', 'E2', 'F2', 'G2', 'H2', 'A3', 'B3', 'C3', 'D3', 'E3', 'F3', 'G3', 'H3', 'A4', 'B4', 'C4', 'D4', 'E4', 'F4', 'G4', 'H4', 'A5', 'B5', 'C5', 'D5', 'E5', 'F5', 'G5', 'H5', 'A6', 'B6', 'C6', 'D6', 'E6', 'F6', 'G6', 'H6', 'A7', 'B7', 'C7', 'D7', 'E7', 'F7', 'G7', 'H7', 'A8', 'B8', 'C8', 'D8', 'E8', 'F8', 'G8', 'H8', 'A9', 'B9', 'C9', 'D9', 'E9', 'F9', 'G9', 'H9', 'A10', 'B10', 'C10', 'D10', 'E10', 'F10', 'G10', 'H10', 'A11', 'B11', 'C11', 'D11', 'E11', 'F11', 'G11', 'H11', 'A12', 'B12', 'C12', 'D12', 'E12', 'F12', 'G12', 'H12']
Answered By: Tim Roberts

If you had a simple list instead of a dataframe, sorting would look like this:

sorted(df["Id"], key=lambda x: (int(x[1:]), x[0]))

pd.DataFrame.sort_values() also has key parameter, the only difference is that the function has to be vetorized: receive a vector and return a vector. You can do something like this:

df.sort_values(by=["Id"], key=lambda col: list(zip(col.str[1:].apply(int), col.str[0])))
Answered By: Maria K
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.