Difficult field name conversion to specific values while performing row by row de-aggregation (Pandas) updated

Question

I have a dataset where I would like to convert specific field names to values while performing a de aggregation the values into their own unique rows as well as perform a long pivot.

Data

# create DataFrame
data = {
    "Start": ['8/1/2013', '8/1/2013'],
    "Date": ['9/1/2013', '9/1/2013'],
    "End": ['10/1/2013', '10/1/2013'],
    "Area": ['NY', 'CA'],
    "Final": ['3/1/2023', '3/1/2023'],
    "Type": ['CC', 'AA'],
    "Middle Stat": [226, 130],
    "Low Stat": [20, 50],
    "High Stat": [10, 0],
    "Middle Stat1": [0, 0],
    "Low Stat1": [0, 0],
    "High Stat1": [0, 0],
    "Re": [0,0],
    "Set": [0,0],
    "Set2": [0,0],
    "Set3": [0,0],

}

Desired

data = {'Start': ['8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013', '8/1/2013'],
        'Date': ['9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013', '9/1/2013'],
        'End': ['10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013', '10/1/2013'],
        'Area': ['NY', 'CA', 'NY', 'CA', 'NY', 'CA'],
        'Final': ['3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023'],
        'Type': ['CC', 'AA', 'CC', 'AA', 'CC', 'AA'],
        'Stat': [20, 50, 226, 130, 10, 0],
        'Range': ['Low', 'Low', 'Middle', 'Middle', 'High', 'High'],
        'Stat1': [0, 0, 0, 0, 0, 0],
        'Re': [0, 0, 0, 0, 0, 0],
        'Set': [0, 0, 0, 0, 0, 0],
        'Set2': [0, 0, 0, 0, 0, 0],
        'Set3': [0, 0, 0, 0, 0, 0]
        }

Doing

I am using this great script provided by SO member, but troubleshooting on how to adjust this to create desired output. I am needing to include all columns as shown in desired output.

import janitor

(df
.pivot_longer(
    index = slice('Start', 'Type'), 
    names_to = ("Range", ".value"), 
    names_sep = " ")
)

Any suggestion is appreciated.

Asked By: Lynn

||

Source

Answer 1

One option is with pivot_longer from pyjanitor – in this case we use the special placeholder .value to identify the parts of the column that we want to remain as headers, while the rest get collated into a new column :

# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
    index = [slice('Start', 'Type'), slice('Re', 'Set3')], 
    names_to = ("Range", ".value"), 
    names_sep = " ")
)

      Start      Date        End Area     Final Type  Re  Set  Set2  Set3   Range  Stat  Stat1
0  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC   0    0     0     0  Middle   226      0
1  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA   0    0     0     0  Middle   130      0
2  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC   0    0     0     0     Low    20      0
3  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA   0    0     0     0     Low    50      0
4  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC   0    0     0     0    High    10      0
5  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA   0    0     0     0    High     0      0

Answered By: sammywemmy

Difficult field name conversion to specific values while performing row by row de-aggregation (Pandas) updated

Question:

Answers: