How to turn rows in each group in dataframe into columns?
Question:
I have a dataframe:
col1 col2 col3 val col4
a1 b1 c1 10 dd
a1 b1 c1 15 kk
a2 b2 c2 20 ff
a2 b2 c2 35 mm
a3 b3 c3 9 sd
I want to put each value in column "val" from each group col1, col2, col3 into each column. So desired result is:
col1 col2 col3 val_1 col4_1 val_2 col4_2
a1 b1 c1 10 dd 15 kk
a2 b2 c2 20 ff 35 mm
a3 b3 c3 9 sd NA NA
How to do that? Is there any function to turn those rows into columns within group?
Answers:
If there exist at most two groups (see comments), then you can make use of the first
and last
functions in combination with a groupby
statement. You just should define your own last
function, that returns you the last element of a group (for your example it would be the second value) if it exists. If it does not exist, meaning the group length equals 1
, it would return you nan
.
Code:
df = pd.DataFrame(
{"col1": ["a1", "a1", "a2", "a2", "a3"],
"col2": ["b1", "b1", "b2", "b2", "b3"],
"col3": ["c1", "c1", "c2", "c2", "c3"],
"val": [10, 15, 20, 35, 9],
"col4": ["dd", "kk", "ff", "mm", "sd"]}
)
last_check = lambda x: pd.NA if len(x) == 1 else x.iloc[-1]
df.groupby(["col1", "col2", "col3"], as_index=False)
.agg(val_1=("val", "first"),
col4_1=("col4", "first"),
val_2=("val", last_check),
col4_2=("col4", last_check))
Output:
col1 col2 col3 val_1 col4_1 val_2 col4_2
a1 b1 c1 10 dd 15 kk
a2 b2 c2 20 ff 35 mm
a3 b3 c3 9 sd <NA> <NA>
here is another aproach:
cols = ['col1','col2','col3']
res = df.assign(num=df.groupby(cols).cumcount()+1).pivot(index=cols,columns='num')
res.columns = [f'{x}_{y}' for x,y in res.columns]
print(res)
'''
val_1 val_2 col4_1 col4_2
col1 col2 col3
a1 b1 c1 10.0 15.0 dd kk
a2 b2 c2 20.0 35.0 ff mm
a3 b3 c3 9.0 NaN sd NaN
it also works with multiple duplicates in first three columns as a bonus
I have a dataframe:
col1 col2 col3 val col4
a1 b1 c1 10 dd
a1 b1 c1 15 kk
a2 b2 c2 20 ff
a2 b2 c2 35 mm
a3 b3 c3 9 sd
I want to put each value in column "val" from each group col1, col2, col3 into each column. So desired result is:
col1 col2 col3 val_1 col4_1 val_2 col4_2
a1 b1 c1 10 dd 15 kk
a2 b2 c2 20 ff 35 mm
a3 b3 c3 9 sd NA NA
How to do that? Is there any function to turn those rows into columns within group?
If there exist at most two groups (see comments), then you can make use of the first
and last
functions in combination with a groupby
statement. You just should define your own last
function, that returns you the last element of a group (for your example it would be the second value) if it exists. If it does not exist, meaning the group length equals 1
, it would return you nan
.
Code:
df = pd.DataFrame(
{"col1": ["a1", "a1", "a2", "a2", "a3"],
"col2": ["b1", "b1", "b2", "b2", "b3"],
"col3": ["c1", "c1", "c2", "c2", "c3"],
"val": [10, 15, 20, 35, 9],
"col4": ["dd", "kk", "ff", "mm", "sd"]}
)
last_check = lambda x: pd.NA if len(x) == 1 else x.iloc[-1]
df.groupby(["col1", "col2", "col3"], as_index=False)
.agg(val_1=("val", "first"),
col4_1=("col4", "first"),
val_2=("val", last_check),
col4_2=("col4", last_check))
Output:
col1 col2 col3 val_1 col4_1 val_2 col4_2
a1 b1 c1 10 dd 15 kk
a2 b2 c2 20 ff 35 mm
a3 b3 c3 9 sd <NA> <NA>
here is another aproach:
cols = ['col1','col2','col3']
res = df.assign(num=df.groupby(cols).cumcount()+1).pivot(index=cols,columns='num')
res.columns = [f'{x}_{y}' for x,y in res.columns]
print(res)
'''
val_1 val_2 col4_1 col4_2
col1 col2 col3
a1 b1 c1 10.0 15.0 dd kk
a2 b2 c2 20.0 35.0 ff mm
a3 b3 c3 9.0 NaN sd NaN
it also works with multiple duplicates in first three columns as a bonus