Transforming unstructured data from excel into structured in python
Question:
This is basically a "current output vs desired output" question. I’m sure this type of question has been asked before but I did not know what keywords to look up.
Currently, I have data like:
| CostCenter |Monday|Tuesday|Wednesday|
|:-----------|:-----------|:-----------|:-----------|
|A21|$1000|$2000|$3000|
|B21|$1000|$2000|$3000|
|C21|$1000|$2000|$3000|
Desired output is:
| CostCenter |DAY|VALUE|
|:-----------|:-----------|:-----|
|A21|Monday|$1000|
|A21|Tuesday|$2000|
|A21|Wednesday|$3000|
|B21|Monday|$1000|
|B21|Tuesday|$2000|
|B21|Wednesday|$3000|
etc…
Any ideas on how to achieve this using Python Pandas would be appreciated! Thank you!
I was thinking of using crossjoin to densify the data and transposing the columns with days but haven’t succeed in coding it.
Answers:
stack and reset the index after setting the index for CostCenter. then reset_index and rename columns
(df.set_index('CostCenter')
.stack()
.reset_index()
.rename(columns={'level_1': 'Day', 0:'Value'})
)
CostCenter Day Value
0 A21 Monday $1000
1 A21 Tuesday $2000
2 A21 Wednesday $3000
3 B21 Monday $1000
4 B21 Tuesday $2000
5 B21 Wednesday $3000
6 C21 Monday $1000
7 C21 Tuesday $2000
8 C21 Wednesday $3000
This is basically a "current output vs desired output" question. I’m sure this type of question has been asked before but I did not know what keywords to look up.
Currently, I have data like:
| CostCenter |Monday|Tuesday|Wednesday|
|:-----------|:-----------|:-----------|:-----------|
|A21|$1000|$2000|$3000|
|B21|$1000|$2000|$3000|
|C21|$1000|$2000|$3000|
Desired output is:
| CostCenter |DAY|VALUE|
|:-----------|:-----------|:-----|
|A21|Monday|$1000|
|A21|Tuesday|$2000|
|A21|Wednesday|$3000|
|B21|Monday|$1000|
|B21|Tuesday|$2000|
|B21|Wednesday|$3000|
etc…
Any ideas on how to achieve this using Python Pandas would be appreciated! Thank you!
I was thinking of using crossjoin to densify the data and transposing the columns with days but haven’t succeed in coding it.
stack and reset the index after setting the index for CostCenter. then reset_index and rename columns
(df.set_index('CostCenter')
.stack()
.reset_index()
.rename(columns={'level_1': 'Day', 0:'Value'})
)
CostCenter Day Value
0 A21 Monday $1000
1 A21 Tuesday $2000
2 A21 Wednesday $3000
3 B21 Monday $1000
4 B21 Tuesday $2000
5 B21 Wednesday $3000
6 C21 Monday $1000
7 C21 Tuesday $2000
8 C21 Wednesday $3000