How to turn rows in each group in dataframe into columns?

Question:

I have a dataframe:

col1  col2  col3   val  col4
a1    b1     c1    10    dd
a1    b1     c1    15    kk
a2    b2     c2    20    ff
a2    b2     c2    35    mm
a3    b3     c3    9     sd

I want to put each value in column "val" from each group col1, col2, col3 into each column. So desired result is:

col1  col2  col3   val_1  col4_1  val_2  col4_2
a1    b1     c1     10     dd      15      kk
a2    b2     c2     20     ff      35      mm
a3    b3     c3     9      sd      NA      NA

How to do that? Is there any function to turn those rows into columns within group?

Asked By: gh1222

||

Answers:

If there exist at most two groups (see comments), then you can make use of the first and last functions in combination with a groupby statement. You just should define your own last function, that returns you the last element of a group (for your example it would be the second value) if it exists. If it does not exist, meaning the group length equals 1, it would return you nan.

Code:

df = pd.DataFrame(
    {"col1": ["a1", "a1", "a2", "a2", "a3"],
     "col2": ["b1", "b1", "b2", "b2", "b3"],  
     "col3": ["c1", "c1", "c2", "c2", "c3"],
     "val": [10, 15, 20, 35, 9],
     "col4": ["dd", "kk", "ff", "mm", "sd"]}
)

last_check = lambda x: pd.NA if len(x) == 1 else x.iloc[-1]

df.groupby(["col1", "col2", "col3"], as_index=False)
    .agg(val_1=("val", "first"),
         col4_1=("col4", "first"),
         val_2=("val", last_check),
         col4_2=("col4", last_check))

Output:

col1  col2  col3  val_1  col4_1  val_2  col4_2          
a1    b1    c1    10     dd      15      kk
a2    b2    c2    20     ff      35      mm
a3    b3    c3    9      sd      <NA>   <NA>
Answered By: ko3

here is another aproach:

cols = ['col1','col2','col3']

res = df.assign(num=df.groupby(cols).cumcount()+1).pivot(index=cols,columns='num')
res.columns = [f'{x}_{y}' for x,y in res.columns]

print(res)
'''
                val_1  val_2 col4_1 col4_2
col1 col2 col3                            
a1   b1   c1     10.0   15.0     dd     kk
a2   b2   c2     20.0   35.0     ff     mm
a3   b3   c3      9.0    NaN     sd    NaN

it also works with multiple duplicates in first three columns as a bonus

Answered By: SergFSM