Sort pandas dataframe headers with two conditions
Question:
i have one question: I have a big dataframe with over 1000 columns.
For example as following the heards of the columns: 2019 Material Cost, 2019 Labor Cost, 2019 Overhead Cost, 2020 Material Cost, 2020 Labor Cost, 2020 Overhead Cost, …2035
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
I want to sort all headers into the following:
2019 Material Cost, 2020 Material cost, 2021 Material Cost,…,2019 Labor Cost, 2020 Labor Cost, 2021 Labor Cost, … ,2019 Overhead Cost, 2020 Overhead Cost,2021 Overhead Cost
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
So i want to have one cost category and sort the years of the category ascending in a following order, then the next category.
Any help here? Thanks in advance
Answers:
Create two lists, one with the costs and one with the years. Using these lists you can create another list containing all column names (in order).
costs = list(df.columns.str[5:].unique())
years = list(range(2019, 2036))
columns = [str(year) + ' ' + cost for year in years for cost in costs]
df = df.reindex(columns=columns)
For example:
df = pd.DataFrame(np.random.random((10, 10)), columns = ['1 a', '2 a', '3 a', '4 a', '5 a', '1 b', '2 b', '3 b', '4 b', '5 b'])
costs = ['a', 'b']
years = [1, 2, 3, 4, 5]
columns = [str(year) + ' ' + cost for year in years for cost in costs]
df.reindex(columns=columns).columns
Returns
Index(['1 a', '1 b', '2 a', '2 b', '3 a', '3 b', '4 a', '4 b', '5 a', '5 b'], dtype='object')
@Chris
given input:
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
and i want to have this as output (sorted after category and ascending for the years):
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
i have one question: I have a big dataframe with over 1000 columns.
For example as following the heards of the columns: 2019 Material Cost, 2019 Labor Cost, 2019 Overhead Cost, 2020 Material Cost, 2020 Labor Cost, 2020 Overhead Cost, …2035
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
I want to sort all headers into the following:
2019 Material Cost, 2020 Material cost, 2021 Material Cost,…,2019 Labor Cost, 2020 Labor Cost, 2021 Labor Cost, … ,2019 Overhead Cost, 2020 Overhead Cost,2021 Overhead Cost
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
So i want to have one cost category and sort the years of the category ascending in a following order, then the next category.
Any help here? Thanks in advance
Create two lists, one with the costs and one with the years. Using these lists you can create another list containing all column names (in order).
costs = list(df.columns.str[5:].unique())
years = list(range(2019, 2036))
columns = [str(year) + ' ' + cost for year in years for cost in costs]
df = df.reindex(columns=columns)
For example:
df = pd.DataFrame(np.random.random((10, 10)), columns = ['1 a', '2 a', '3 a', '4 a', '5 a', '1 b', '2 b', '3 b', '4 b', '5 b'])
costs = ['a', 'b']
years = [1, 2, 3, 4, 5]
columns = [str(year) + ' ' + cost for year in years for cost in costs]
df.reindex(columns=columns).columns
Returns
Index(['1 a', '1 b', '2 a', '2 b', '3 a', '3 b', '4 a', '4 b', '5 a', '5 b'], dtype='object')
@Chris
given input:
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})
and i want to have this as output (sorted after category and ascending for the years):
df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
'2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
'2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
'2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
})