Pandas excel 3 level header transformation

Question:

Having an excel sheet for sales by product by state by location as below

New Jersey New Jersey California California California California New York New York
Willowbrook Willowbrook Great Mall Great Mall Westminster Mall Westminster Mall The Fulton Center The Fulton Center
Transaction Date Transaction Document Partno Qty  Value Qty  Value Qty  Value Qty  Value Total Qty  Total Value
17/5/2022 8534 P1 1           150.00 1       150.00
17/5/2022 8654 P6 2           400.00 2       400.00
18/5/2022 3537 P2 1           175.00 1       175.00
20/5/2022 2301 P3 2       150.00 2       150.00
20/5/2022 2301 P5 1         50.00 1         50.00

Need to change it to become as below

Transaction Date Transaction Document Partno Qty   Value State  Location
17/5/2022 8534 P1 1           150.00 California Great Mall
17/5/2022 8654 P6 2           400.00 California Westminster Mall
18/5/2022 3537 P2 1           175.00 New Jersey Willowbrook
20/5/2022 2301 P3 2           150.00 New York The Fulton Center
20/5/2022 2301 P5 1             50.00 New York The Fulton Center

There are monthly reports like this so I was looking for a python function that uses pandas or any other library to be able to pass the files and transform it as above

Asked By: Gojoe

||

Answers:

You can remove last 2 columns by DataFrame.iloc, create columns names by DataFrame.set_axis, reshape by DataFrame.stack:

df = pd.read_excel('data.xlsx', header=[0, 1, 2], index_col=[0, 1, 2])

print (df.index)
MultiIndex([('17/5/2022 ', 8534, 'P1 '),
            ('17/5/2022 ', 8654, 'P6 '),
            ('18/5/2022 ', 3537, 'P2 '),
            ('20/5/2022 ', 2301, 'P3 '),
            ('20/5/2022 ', 2301, 'P5 ')],
           )

print (df.columns)
MultiIndex([(        'New Jersey ',        'Willowbrook ',         'Qty '),
            (        'New Jersey ',        'Willowbrook ',      ' Value '),
            (        'California ',         'Great Mall ',         'Qty '),
            (        'California ',         'Great Mall ',      ' Value '),
            (        'California ',   'Westminster Mall ',         'Qty '),
            (        'California ',   'Westminster Mall ',      ' Value '),
            (          'New York ',  'The Fulton Center ',         'Qty '),
            (          'New York ',  'The Fulton Center ',      ' Value '),
            ('Unnamed: 11_level_0', 'Unnamed: 11_level_1',   'Total Qty '),
            ('Unnamed: 12_level_0', 'Unnamed: 12_level_1', ' Total Value')],
           names=[' ', None, 'Transaction Date '])

df1 = (df.iloc[:, :-2]
         .rename_axis(columns=['State','Location', None],
                      index=['Transaction Date','Transaction Document','Partno'])
         .stack(level=[0,1])
         .reset_index())
print (df1)
  Transaction Date  Transaction Document Partno        State  
0       17/5/2022                   8534    P1   California    
1       17/5/2022                   8654    P6   California    
2       18/5/2022                   3537    P2   New Jersey    
3       20/5/2022                   2301    P3     New York    
4       20/5/2022                   2301    P5     New York    

             Location   Value   Qty   
0         Great Mall     150.0   1.0  
1   Westminster Mall     400.0   2.0  
2        Willowbrook     175.0   1.0  
3  The Fulton Center     150.0   2.0  
4  The Fulton Center      50.0   1.0  
Answered By: jezrael
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.