Difficult field name conversion to specific values while performing row by row de-aggregation (Pandas) updated
Question:
I have a dataset where I would like to convert specific field names to values while performing a de aggregation the values into their own unique rows as well as perform a long pivot.
Data
# create DataFrame
data = {
"Start": ['8/1/2013', '8/1/2013'],
"Date": ['9/1/2013', '9/1/2013'],
"End": ['10/1/2013', '10/1/2013'],
"Area": ['NY', 'CA'],
"Final": ['3/1/2023', '3/1/2023'],
"Type": ['CC', 'AA'],
"Middle Stat": [226, 130],
"Low Stat": [20, 50],
"High Stat": [10, 0],
"Middle Stat1": [0, 0],
"Low Stat1": [0, 0],
"High Stat1": [0, 0],
"Re": [0,0],
"Set": [0,0],
"Set2": [0,0],
"Set3": [0,0],
}
Desired
data = {'Start': ['8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013'],
'Date': ['9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013'],
'End': ['10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013'],
'Area': ['NY', 'CA', 'NY', 'CA', 'NY', 'CA'],
'Final': ['3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023'],
'Type': ['CC', 'AA', 'CC', 'AA', 'CC', 'AA'],
'Stat': [20, 50, 226, 130, 10, 0],
'Range': ['Low', 'Low', 'Middle', 'Middle', 'High', 'High'],
'Stat1': [0, 0, 0, 0, 0, 0],
'Re': [0, 0, 0, 0, 0, 0],
'Set': [0, 0, 0, 0, 0, 0],
'Set2': [0, 0, 0, 0, 0, 0],
'Set3': [0, 0, 0, 0, 0, 0]
}
Doing
I am using this great script provided by SO member, but troubleshooting on how to adjust this to create desired output. I am needing to include all columns as shown in desired output.
import janitor
(df
.pivot_longer(
index = slice('Start', 'Type'),
names_to = ("Range", ".value"),
names_sep = " ")
)
Any suggestion is appreciated.
Answers:
One option is with pivot_longer from pyjanitor – in this case we use the special placeholder .value
to identify the parts of the column that we want to remain as headers, while the rest get collated into a new column :
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = [slice('Start', 'Type'), slice('Re', 'Set3')],
names_to = ("Range", ".value"),
names_sep = " ")
)
Start Date End Area Final Type Re Set Set2 Set3 Range Stat Stat1
0 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 0 0 0 0 Middle 226 0
1 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 0 0 0 0 Middle 130 0
2 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 0 0 0 0 Low 20 0
3 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 0 0 0 0 Low 50 0
4 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 0 0 0 0 High 10 0
5 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 0 0 0 0 High 0 0
I have a dataset where I would like to convert specific field names to values while performing a de aggregation the values into their own unique rows as well as perform a long pivot.
Data
# create DataFrame
data = {
"Start": ['8/1/2013', '8/1/2013'],
"Date": ['9/1/2013', '9/1/2013'],
"End": ['10/1/2013', '10/1/2013'],
"Area": ['NY', 'CA'],
"Final": ['3/1/2023', '3/1/2023'],
"Type": ['CC', 'AA'],
"Middle Stat": [226, 130],
"Low Stat": [20, 50],
"High Stat": [10, 0],
"Middle Stat1": [0, 0],
"Low Stat1": [0, 0],
"High Stat1": [0, 0],
"Re": [0,0],
"Set": [0,0],
"Set2": [0,0],
"Set3": [0,0],
}
Desired
data = {'Start': ['8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013'],
'Date': ['9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013'],
'End': ['10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013'],
'Area': ['NY', 'CA', 'NY', 'CA', 'NY', 'CA'],
'Final': ['3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023'],
'Type': ['CC', 'AA', 'CC', 'AA', 'CC', 'AA'],
'Stat': [20, 50, 226, 130, 10, 0],
'Range': ['Low', 'Low', 'Middle', 'Middle', 'High', 'High'],
'Stat1': [0, 0, 0, 0, 0, 0],
'Re': [0, 0, 0, 0, 0, 0],
'Set': [0, 0, 0, 0, 0, 0],
'Set2': [0, 0, 0, 0, 0, 0],
'Set3': [0, 0, 0, 0, 0, 0]
}
Doing
I am using this great script provided by SO member, but troubleshooting on how to adjust this to create desired output. I am needing to include all columns as shown in desired output.
import janitor
(df
.pivot_longer(
index = slice('Start', 'Type'),
names_to = ("Range", ".value"),
names_sep = " ")
)
Any suggestion is appreciated.
One option is with pivot_longer from pyjanitor – in this case we use the special placeholder .value
to identify the parts of the column that we want to remain as headers, while the rest get collated into a new column :
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = [slice('Start', 'Type'), slice('Re', 'Set3')],
names_to = ("Range", ".value"),
names_sep = " ")
)
Start Date End Area Final Type Re Set Set2 Set3 Range Stat Stat1
0 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 0 0 0 0 Middle 226 0
1 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 0 0 0 0 Middle 130 0
2 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 0 0 0 0 Low 20 0
3 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 0 0 0 0 Low 50 0
4 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 0 0 0 0 High 10 0
5 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 0 0 0 0 High 0 0