Pandas excel 3 level header transformation
Question:
Having an excel sheet for sales by product by state by location as below
New Jersey
New Jersey
California
California
California
California
New York
New York
Willowbrook
Willowbrook
Great Mall
Great Mall
Westminster Mall
Westminster Mall
The Fulton Center
The Fulton Center
Transaction Date
Transaction Document
Partno
Qty
Value
Qty
Value
Qty
Value
Qty
Value
Total Qty
Total Value
17/5/2022
8534
P1
1
150.00
1
150.00
17/5/2022
8654
P6
2
400.00
2
400.00
18/5/2022
3537
P2
1
175.00
1
175.00
20/5/2022
2301
P3
2
150.00
2
150.00
20/5/2022
2301
P5
1
50.00
1
50.00
Need to change it to become as below
Transaction Date
Transaction Document
Partno
Qty
Value
State
Location
17/5/2022
8534
P1
1
150.00
California
Great Mall
17/5/2022
8654
P6
2
400.00
California
Westminster Mall
18/5/2022
3537
P2
1
175.00
New Jersey
Willowbrook
20/5/2022
2301
P3
2
150.00
New York
The Fulton Center
20/5/2022
2301
P5
1
50.00
New York
The Fulton Center
There are monthly reports like this so I was looking for a python function that uses pandas or any other library to be able to pass the files and transform it as above
Answers:
You can remove last 2 columns by DataFrame.iloc
, create columns names by DataFrame.set_axis
, reshape by DataFrame.stack
:
df = pd.read_excel('data.xlsx', header=[0, 1, 2], index_col=[0, 1, 2])
print (df.index)
MultiIndex([('17/5/2022 ', 8534, 'P1 '),
('17/5/2022 ', 8654, 'P6 '),
('18/5/2022 ', 3537, 'P2 '),
('20/5/2022 ', 2301, 'P3 '),
('20/5/2022 ', 2301, 'P5 ')],
)
print (df.columns)
MultiIndex([( 'New Jersey ', 'Willowbrook ', 'Qty '),
( 'New Jersey ', 'Willowbrook ', ' Value '),
( 'California ', 'Great Mall ', 'Qty '),
( 'California ', 'Great Mall ', ' Value '),
( 'California ', 'Westminster Mall ', 'Qty '),
( 'California ', 'Westminster Mall ', ' Value '),
( 'New York ', 'The Fulton Center ', 'Qty '),
( 'New York ', 'The Fulton Center ', ' Value '),
('Unnamed: 11_level_0', 'Unnamed: 11_level_1', 'Total Qty '),
('Unnamed: 12_level_0', 'Unnamed: 12_level_1', ' Total Value')],
names=[' ', None, 'Transaction Date '])
df1 = (df.iloc[:, :-2]
.rename_axis(columns=['State','Location', None],
index=['Transaction Date','Transaction Document','Partno'])
.stack(level=[0,1])
.reset_index())
print (df1)
Transaction Date Transaction Document Partno State
0 17/5/2022 8534 P1 California
1 17/5/2022 8654 P6 California
2 18/5/2022 3537 P2 New Jersey
3 20/5/2022 2301 P3 New York
4 20/5/2022 2301 P5 New York
Location Value Qty
0 Great Mall 150.0 1.0
1 Westminster Mall 400.0 2.0
2 Willowbrook 175.0 1.0
3 The Fulton Center 150.0 2.0
4 The Fulton Center 50.0 1.0
Having an excel sheet for sales by product by state by location as below
New Jersey | New Jersey | California | California | California | California | New York | New York | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Willowbrook | Willowbrook | Great Mall | Great Mall | Westminster Mall | Westminster Mall | The Fulton Center | The Fulton Center | |||||
Transaction Date | Transaction Document | Partno | Qty | Value | Qty | Value | Qty | Value | Qty | Value | Total Qty | Total Value |
17/5/2022 | 8534 | P1 | 1 | 150.00 | 1 | 150.00 | ||||||
17/5/2022 | 8654 | P6 | 2 | 400.00 | 2 | 400.00 | ||||||
18/5/2022 | 3537 | P2 | 1 | 175.00 | 1 | 175.00 | ||||||
20/5/2022 | 2301 | P3 | 2 | 150.00 | 2 | 150.00 | ||||||
20/5/2022 | 2301 | P5 | 1 | 50.00 | 1 | 50.00 |
Need to change it to become as below
Transaction Date | Transaction Document | Partno | Qty | Value | State | Location |
---|---|---|---|---|---|---|
17/5/2022 | 8534 | P1 | 1 | 150.00 | California | Great Mall |
17/5/2022 | 8654 | P6 | 2 | 400.00 | California | Westminster Mall |
18/5/2022 | 3537 | P2 | 1 | 175.00 | New Jersey | Willowbrook |
20/5/2022 | 2301 | P3 | 2 | 150.00 | New York | The Fulton Center |
20/5/2022 | 2301 | P5 | 1 | 50.00 | New York | The Fulton Center |
There are monthly reports like this so I was looking for a python function that uses pandas or any other library to be able to pass the files and transform it as above
You can remove last 2 columns by DataFrame.iloc
, create columns names by DataFrame.set_axis
, reshape by DataFrame.stack
:
df = pd.read_excel('data.xlsx', header=[0, 1, 2], index_col=[0, 1, 2])
print (df.index)
MultiIndex([('17/5/2022 ', 8534, 'P1 '),
('17/5/2022 ', 8654, 'P6 '),
('18/5/2022 ', 3537, 'P2 '),
('20/5/2022 ', 2301, 'P3 '),
('20/5/2022 ', 2301, 'P5 ')],
)
print (df.columns)
MultiIndex([( 'New Jersey ', 'Willowbrook ', 'Qty '),
( 'New Jersey ', 'Willowbrook ', ' Value '),
( 'California ', 'Great Mall ', 'Qty '),
( 'California ', 'Great Mall ', ' Value '),
( 'California ', 'Westminster Mall ', 'Qty '),
( 'California ', 'Westminster Mall ', ' Value '),
( 'New York ', 'The Fulton Center ', 'Qty '),
( 'New York ', 'The Fulton Center ', ' Value '),
('Unnamed: 11_level_0', 'Unnamed: 11_level_1', 'Total Qty '),
('Unnamed: 12_level_0', 'Unnamed: 12_level_1', ' Total Value')],
names=[' ', None, 'Transaction Date '])
df1 = (df.iloc[:, :-2]
.rename_axis(columns=['State','Location', None],
index=['Transaction Date','Transaction Document','Partno'])
.stack(level=[0,1])
.reset_index())
print (df1)
Transaction Date Transaction Document Partno State
0 17/5/2022 8534 P1 California
1 17/5/2022 8654 P6 California
2 18/5/2022 3537 P2 New Jersey
3 20/5/2022 2301 P3 New York
4 20/5/2022 2301 P5 New York
Location Value Qty
0 Great Mall 150.0 1.0
1 Westminster Mall 400.0 2.0
2 Willowbrook 175.0 1.0
3 The Fulton Center 150.0 2.0
4 The Fulton Center 50.0 1.0