Sort pandas dataframe headers with two conditions

Question

i have one question: I have a big dataframe with over 1000 columns.

For example as following the heards of the columns: 2019 Material Cost, 2019 Labor Cost, 2019 Overhead Cost, 2020 Material Cost, 2020 Labor Cost, 2020 Overhead Cost, …2035


df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                  })

I want to sort all headers into the following:

2019 Material Cost, 2020 Material cost, 2021 Material Cost,…,2019 Labor Cost, 2020 Labor Cost, 2021 Labor Cost, … ,2019 Overhead Cost, 2020 Overhead Cost,2021 Overhead Cost

df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                  })

So i want to have one cost category and sort the years of the category ascending in a following order, then the next category.

Any help here? Thanks in advance

Asked By: Marc Irmler

||

Source

Answer 1

Create two lists, one with the costs and one with the years. Using these lists you can create another list containing all column names (in order).

costs = list(df.columns.str[5:].unique())
years = list(range(2019, 2036))

columns = [str(year) + ' ' + cost for year in years for cost in costs]
df = df.reindex(columns=columns)

For example:

df = pd.DataFrame(np.random.random((10, 10)), columns = ['1 a', '2 a', '3 a', '4 a', '5 a', '1 b', '2 b', '3 b', '4 b', '5 b'])
costs = ['a', 'b']
years = [1, 2, 3, 4, 5]
columns = [str(year) + ' ' + cost for year in years for cost in costs]
df.reindex(columns=columns).columns

Returns

Index(['1 a', '1 b', '2 a', '2 b', '3 a', '3 b', '4 a', '4 b', '5 a', '5 b'], dtype='object')

Answered By: T C Molenaar

Answer 2

@Chris
given input:

df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                  })

and i want to have this as output (sorted after category and ascending for the years):

df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                  })

Answered By: Marc Irmler

Sort pandas dataframe headers with two conditions

Question:

Answers: