How to transpose certain columns in a pandas dataframe

Question:

Currently have a dataframe like this:

RepName Hours Reason
John 4 Research
John 15 Training
Matt 6 Project Labor
Matt 10 Training

I want to transpose each reason as a column with the Hours values as values like so:

RepName Research Training Project Labor
John 4 15 0
Matt 0 10 6

I tried to transpose and melting the dataframe and couldn’t figure it out. Still sort of new to this, any help would be appreciated.

Asked By: jmike877

||

Answers:

You can do this with pd.pivot_table like so:

>>> df.pivot_table(index="RepName", columns="Reason", values="Hours", fill_value=0)
Reason   Project Labor  Research  Training
RepName
John                 0         4        15
Matt                 6         0        10

Full example:

import pandas as pd

input_data = pd.DataFrame(
    {
        "RepName": ["John", "John", "Matt", "Matt"],
        "Hours": [4, 15, 6, 10],
        "Reason": ["Research", "Training", "Project Labor", "Training"],
    }
)

output_data = pd.DataFrame(
    {
        "Project Labor": [0, 6],
        "Research": [4, 0],
        "Training": [15, 10],
    },
    index=["John", "Matt"],
)

pivoted = input_data.pivot_table(
    index="RepName",
    columns="Reason",
    values="Hours",
    fill_value=0,
)

assert output_data.equals(pivoted)
Answered By: Sage Betko

This is a .pivot operation. The point of a pivot is to turn some subset of rows into their own columns.

out = df.pivot(index='RepName', columns='Reason', values='Hours').fillna(0)
Reason   Project Labor  Research  Training
RepName                                   
John               0.0       4.0      15.0
Matt               6.0       0.0      10.0

print(out)
Reason   Project Labor  Research  Training
RepName                                   
John               0.0       4.0      15.0
Matt               6.0       0.0      10.0

If you’re unfamiliar with pandas.Index that has names, you can get a more familiar result by removing the name from the column index and resetting the row index:

out = (
    df.pivot(index='RepName', columns='Reason', values='Hours')
    .rename_axis(columns=None)
    .reset_index()
    .fillna(0)
)

print(out)
  RepName  Project Labor  Research  Training
0    John            0.0       4.0      15.0
1    Matt            6.0       0.0      10.0
Answered By: Cameron Riddell
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.