How to pivot multiple columns

Question:

Input Table

index income Education age1to_20 pcd
1 income_1 Education_0 1 A5009
2 income_2 Education_2 1 A3450
3 income_1 Education_0 1 A5009
4 income_3 Education_1 0 A3450

How do I convert this table into

index income_1 income_2 INCOME_3 Education_0 Education_1 Education_2 age1to_20
1 A5009 0 0 A5009 0 0 A5009
2 0 A3450 0 0 0 A3450 A3450
3 A5009 0 0 A5009 0 0 A5009
4 0 0 A3450 0 A3450 0 0

UPDATED THE OUTPUT TABLE

Asked By: nuke

||

Answers:

you need to melt and pivot:

cols_to_pivot = ['income', 'Education']

cols = df.columns.difference(cols_to_pivot)
(df.melt(cols).drop(columns='variable')
   .pivot(index=['index', 'age1to_20'], columns='value', values='pcd')
   .fillna(0).reset_index()
)

output:

value  index  age1to_20 Education_0 Education_1 Education_2 income_1 income_2 income_3
0          1          1       A5009           0           0    A5009        0        0
1          2          1           0           0       A3450        0    A3450        0
2          3          1       A5009           0           0    A5009        0        0
3          4          0           0       A3450           0        0        0    A3450
reworking column age1to_20
cols_to_pivot = ['income', 'Education']

cols = df.columns.difference(cols_to_pivot)
(df.assign(age1to_20=df['pcd'].where(df['age1to_20'].eq(1), 0))
   .melt(cols).drop(columns='variable')
   .pivot(index=['index', 'age1to_20'], columns='value', values='pcd')
   .fillna(0).reset_index()
)

output:

value  index age1to_20 Education_0 Education_1 Education_2 income_1 income_2 income_3
0          1     A5009       A5009           0           0    A5009        0        0
1          2     A3450           0           0       A3450        0    A3450        0
2          3     A5009       A5009           0           0    A5009        0        0
3          4         0           0       A3450           0        0        0    A3450
Answered By: mozway

Another possible solution:

(pd.concat([
    df.pivot(index=['index', 'age1to_20'], columns=['income'], values='pcd'),
    df.pivot(index=['index', 'age1to_20'], columns=['Education'], values='pcd')], axis=1)
 .fillna(0).reset_index())

Output:

   index  age1to_20 income_1 income_2 income_3 Education_0 Education_1 Education_2
0      1          1    A5009        0        0       A5009           0           0
1      2          1        0    A3450        0           0           0       A3450
2      3          1    A5009        0        0       A5009           0           0
3      4          0        0        0    A3450           0       A3450           0

EDIT

In case there are a lot of columns to pivot, the following code does that, by iterating over the list of columns to pivot with map:

cols = ['income', 'Education']

(pd.concat(
    map(lambda x: df.pivot(
        index=['index', 'age1to_20'], columns=x, values='pcd'), cols), axis=1)
 .fillna(0).reset_index())
Answered By: PaulS

One option is to pivot once, run a for loop across the dataframe to get the individual dataframes, and concatenate across the columns, to get the final output:

out = df.pivot(index=['index', 'age1to_20'], 
               columns=['income', 'Education'], 
               values = 'pcd')

out = [out.droplevel(axis=1,level=n) 
       for n in range(out.columns.nlevels)]

pd.concat(out, axis = 1).fillna(0, downcast='infer').reset_index()

   index  age1to_20 Education_0 Education_2 Education_1 income_1 income_2 income_3
0      1          1       A5009           0           0    A5009        0        0
1      2          1           0       A3450           0        0    A3450        0
2      3          1       A5009           0           0    A5009        0        0
3      4          0           0           0       A3450        0        0    A3450
Answered By: sammywemmy

pandas.get_dummies allows us to get almost the desired output, but instead of getting the values as the pcd column, one has 0 or 1 for a given pcd

import pandas as pd

df2 = pd.get_dummies(df, columns=['income', 'Education'])

[Out]:

   index  age1to_20  ... Education_Education_1  Education_Education_2
0      1          1  ...                     0                      0
1      2          1  ...                     0                      1
2      3          1  ...                     0                      0
3      4          0  ...                     1                      0

enter image description here

Then, in order to change the 1s with the respective pcd value (and drop the column pcd at the end), one can use pandas.Series.mask and pandas.DataFrame.drop as follows

cols = df2.columns.drop(['index', 'pcd']) # To select the columns one wants to apply the change
df2[cols] = df2[cols].mask(df2[cols] == 1, df2['pcd'], axis=0)
df2 = df2.drop(['pcd'], axis=1) 

[Out]:

   index age1to_20  ... Education_Education_1 Education_Education_2
0      1     A5009  ...                     0                     0
1      2     A3450  ...                     0                 A3450
2      3     A5009  ...                     0                     0
3      4         0  ...                 A3450                     0

enter image description here


Notes:

  • An alternative to change the 1s to the respective pcd and drop the column pcd would be to use pandas.DataFrame.assign, numpy.where and pandas.DataFrame.drop as follows

    import numpy as np
    
    df2 = df2.assign(income_1 = np.where(df2['income_income_1'] == 1, df2['pcd'], 0),
                  income_2 = np.where(df2['income_income_2'] == 1, df2['pcd'], 0),
                  income_3 = np.where(df2['income_income_3'] == 1, df2['pcd'], 0),
                  Education_0 = np.where(df2['Education_Education_0'] == 1, df2['pcd'], 0),
                  Education_1 = np.where(df2['Education_Education_1'] == 1, df2['pcd'], 0),
                  Education_2 = np.where(df2['Education_Education_2'] == 1, df2['pcd'], 0),
                  Age1to20 = np.where(df2['age1to_20'] == 1, df2['pcd'], 0)).drop(['age1to_20', 'income_income_1', 'income_income_2', 'income_income_3', 'Education_Education_0', 'Education_Education_1', 'Education_Education_2', 'pcd'], axis=1)
    
    [Out]:
    
       index income_1 income_2  ... Education_1 Education_2 Age1to20
    0      1    A5009        0  ...           0           0    A5009
    1      2        0    A3450  ...           0       A3450    A3450
    2      3    A5009        0  ...           0           0    A5009
    3      4        0        0  ...       A3450           0        0
    

    enter image description here

  • There are additional ways to select the columns one wants to consider in a given dataframe. For that this might be useful: Selecting multiple columns in a Pandas dataframe

Answered By: Gonçalo Peres