Reordering Pandas Columns based on Column name

Question:

I have columns with similar names but numeric suffixes that represent different occurrences of each column. For example, I have columns (company_1, job_title_1, location_1, company_2, job_title_2, location_2). I would like to order these columns grouped together by the prefix (before the underscore) and then sequentially by the suffix (after the underscore).

How I would like the columns to be: company_1, company_2, job_title_1, job_title_2, location_1, location_2.

Here’s what I tried from this question:

df = df.reindex(sorted(df.columns), axis=1)

This resulted in the order: company_1, company_10, company_11 (skipping over 2-9)

Asked By: codr

||

Answers:

This type of sorting is called natural sorting. (There are more details in Naturally sorting Pandas DataFrame which demonstrates how to sort rows using natsort)

Setup with natsort

import pandas as pd
from natsort import natsorted

df = pd.DataFrame(columns=[f'company_{i}' for i in [5, 2, 3, 4, 1, 10]])

# Before column sort
print(df)

df = df.reindex(natsorted(df.columns), axis=1)

# After column sort
print(df)

Before sort:

Empty DataFrame
Columns: [company_5, company_2, company_3, company_4, company_1, company_10]
Index: []

After sort:

Empty DataFrame
Columns: [company_1, company_2, company_3, company_4, company_5, company_10]
Index: []

Compared to lexicographic sorting with sorted:

df = df.reindex(sorted(df.columns), axis=1)
Empty DataFrame
Columns: [company_1, company_10, company_2, company_3, company_4, company_5]
Index: []
Answered By: Henry Ecker

Just for the sake of completeness, you can also get the desired result by passing a function to sorted that splits the strings into name, index tuples.


def index_splitter(x):
    """Example input: 'job_title_1'
       Output: ('job_title', 1)
    """
    *name, index = x.split("_")
    return '_'.join(name), int(index)

df = df.reindex(sorted(df.columns, key=index_splitter), axis=1)
Answered By: fsimonjetz