How to pivot and extend dataframe columns
Question:
I have the following dataframe:
name precision recall
a 0.28 0.23
b 1.00 0.00
a 0.31 0.23
b 0.25 0.00
The desired output is:
a_precision a_recall b_precision b_recall
0.28 0.23 1.00 0.00
0.31 0.23 0.25 0.00
Any idea how to perform this pivot-like operation?
In my dataset I have 5 different names (a,b,c,d,e) and each sixth row it starts again with name a, b… and so on. Beside of precision and recall I have another column called f1_score. So probably the solution should be adaptable to a different dataframe schema.
I am looking forward how you would tackle this problem.
Answers:
You can use pivot
with a little bit of reworking your dataframe:
df2 = (df.assign(group=df.groupby('name').cumcount()) # get position across groups
.pivot(index='group', columns='name')
)
df2.columns = ['%s_%s' % (b,a) for (a,b) in df2.columns]
df2.sort_index(axis=1)
output:
a_precision a_recall b_precision b_recall
group
0 0.28 0.23 1.00 0.0
1 0.31 0.23 0.25 0.0
def function1(dd:pd.DataFrame):
dd1=dd.iloc[:,1:].add_prefix(dd.name.max()+"_").reset_index(drop=True)
return dd1
pd.concat(pd.DataFrame(df1.groupby("name"))[1].map(function1).tolist(),axis=1)
out:
a_precision a_recall b_precision b_recall
0 0.28 0.23 1.00 0.0
1 0.31 0.23 0.25 0.0
I have the following dataframe:
name precision recall
a 0.28 0.23
b 1.00 0.00
a 0.31 0.23
b 0.25 0.00
The desired output is:
a_precision a_recall b_precision b_recall
0.28 0.23 1.00 0.00
0.31 0.23 0.25 0.00
Any idea how to perform this pivot-like operation?
In my dataset I have 5 different names (a,b,c,d,e) and each sixth row it starts again with name a, b… and so on. Beside of precision and recall I have another column called f1_score. So probably the solution should be adaptable to a different dataframe schema.
I am looking forward how you would tackle this problem.
You can use pivot
with a little bit of reworking your dataframe:
df2 = (df.assign(group=df.groupby('name').cumcount()) # get position across groups
.pivot(index='group', columns='name')
)
df2.columns = ['%s_%s' % (b,a) for (a,b) in df2.columns]
df2.sort_index(axis=1)
output:
a_precision a_recall b_precision b_recall
group
0 0.28 0.23 1.00 0.0
1 0.31 0.23 0.25 0.0
def function1(dd:pd.DataFrame):
dd1=dd.iloc[:,1:].add_prefix(dd.name.max()+"_").reset_index(drop=True)
return dd1
pd.concat(pd.DataFrame(df1.groupby("name"))[1].map(function1).tolist(),axis=1)
out:
a_precision a_recall b_precision b_recall
0 0.28 0.23 1.00 0.0
1 0.31 0.23 0.25 0.0