Pandas excel 3 level header transformation

Question

Having an excel sheet for sales by product by state by location as below

			New Jersey	New Jersey	California	California	California	California	New York	New York
			Willowbrook	Willowbrook	Great Mall	Great Mall	Westminster Mall	Westminster Mall	The Fulton Center	The Fulton Center
Transaction Date	Transaction Document	Partno	Qty	Value	Qty	Value	Qty	Value	Qty	Value	Total Qty	Total Value
17/5/2022	8534	P1			1	150.00					1	150.00
17/5/2022	8654	P6					2	400.00			2	400.00
18/5/2022	3537	P2	1	175.00							1	175.00
20/5/2022	2301	P3							2	150.00	2	150.00
20/5/2022	2301	P5							1	50.00	1	50.00

Need to change it to become as below

Transaction Date	Transaction Document	Partno	Qty	Value	State	Location
17/5/2022	8534	P1	1	150.00	California	Great Mall
17/5/2022	8654	P6	2	400.00	California	Westminster Mall
18/5/2022	3537	P2	1	175.00	New Jersey	Willowbrook
20/5/2022	2301	P3	2	150.00	New York	The Fulton Center
20/5/2022	2301	P5	1	50.00	New York	The Fulton Center

There are monthly reports like this so I was looking for a python function that uses pandas or any other library to be able to pass the files and transform it as above

Asked By: Gojoe

||

Source

Answer 1

You can remove last 2 columns by DataFrame.iloc, create columns names by DataFrame.set_axis, reshape by DataFrame.stack:

df = pd.read_excel('data.xlsx', header=[0, 1, 2], index_col=[0, 1, 2])

print (df.index)
MultiIndex([('17/5/2022 ', 8534, 'P1 '),
            ('17/5/2022 ', 8654, 'P6 '),
            ('18/5/2022 ', 3537, 'P2 '),
            ('20/5/2022 ', 2301, 'P3 '),
            ('20/5/2022 ', 2301, 'P5 ')],
           )

print (df.columns)
MultiIndex([(        'New Jersey ',        'Willowbrook ',         'Qty '),
            (        'New Jersey ',        'Willowbrook ',      ' Value '),
            (        'California ',         'Great Mall ',         'Qty '),
            (        'California ',         'Great Mall ',      ' Value '),
            (        'California ',   'Westminster Mall ',         'Qty '),
            (        'California ',   'Westminster Mall ',      ' Value '),
            (          'New York ',  'The Fulton Center ',         'Qty '),
            (          'New York ',  'The Fulton Center ',      ' Value '),
            ('Unnamed: 11_level_0', 'Unnamed: 11_level_1',   'Total Qty '),
            ('Unnamed: 12_level_0', 'Unnamed: 12_level_1', ' Total Value')],
           names=[' ', None, 'Transaction Date '])

df1 = (df.iloc[:, :-2]
         .rename_axis(columns=['State','Location', None],
                      index=['Transaction Date','Transaction Document','Partno'])
         .stack(level=[0,1])
         .reset_index())
print (df1)
  Transaction Date  Transaction Document Partno        State  
0       17/5/2022                   8534    P1   California    
1       17/5/2022                   8654    P6   California    
2       18/5/2022                   3537    P2   New Jersey    
3       20/5/2022                   2301    P3     New York    
4       20/5/2022                   2301    P5     New York    

             Location   Value   Qty   
0         Great Mall     150.0   1.0  
1   Westminster Mall     400.0   2.0  
2        Willowbrook     175.0   1.0  
3  The Fulton Center     150.0   2.0  
4  The Fulton Center      50.0   1.0

Answered By: jezrael

Pandas excel 3 level header transformation

Question:

Answers: