How to pivot and stack dataframe in pandas without group by while having duplicate values in the pivot?
Question:
I have df that looks like this:
ref text id
a zz 12eia
a yy radf02
b aa a8adf
b bb 2022a
I am trying to rotate this dataframe to look like below with values in column ref
becoming column names and values in text
becoming values under those columns and I dont need the ‘id’ column :
a b
zz aa
yy bb
I tried using this line, but I am not getting the result, without adding the id
column:
df_rotated = df.pivot_table(index='ref', values='text', columns='id', aggfunc='first')
The data collapses and is not the result I want, what am I doing wrong?
Answers:
You need to create an appropriate index, which you can do using groupby
and .cumcount
.
Here I create the required index:
df['ind'] = df.groupby(['ref']).cumcount()
Which looks like this:
ref text id ind
0 a zz 12eia 0
1 a yy radf02 1
2 b aa a8adf 0
3 b bb 2022a 1
You can then create your df.pivot
as per the following code:
Code:
df = pd.DataFrame({ 'ref': ['a', 'a', 'b', 'b'],
'text': ['zz', 'yy', 'aa', 'bb'],
'id': ['12eia', 'radf02', 'a8adf', '2022a']})
df['ind'] = df.groupby(['ref']).cumcount()
df_rotated = df.pivot(columns='ref', values='text', index = 'ind').reset_index(drop='true')
print(df_rotated)
Output:
ref a b
0 zz aa
1 yy bb
I have df that looks like this:
ref text id
a zz 12eia
a yy radf02
b aa a8adf
b bb 2022a
I am trying to rotate this dataframe to look like below with values in column ref
becoming column names and values in text
becoming values under those columns and I dont need the ‘id’ column :
a b
zz aa
yy bb
I tried using this line, but I am not getting the result, without adding the id
column:
df_rotated = df.pivot_table(index='ref', values='text', columns='id', aggfunc='first')
The data collapses and is not the result I want, what am I doing wrong?
You need to create an appropriate index, which you can do using groupby
and .cumcount
.
Here I create the required index:
df['ind'] = df.groupby(['ref']).cumcount()
Which looks like this:
ref text id ind
0 a zz 12eia 0
1 a yy radf02 1
2 b aa a8adf 0
3 b bb 2022a 1
You can then create your df.pivot
as per the following code:
Code:
df = pd.DataFrame({ 'ref': ['a', 'a', 'b', 'b'],
'text': ['zz', 'yy', 'aa', 'bb'],
'id': ['12eia', 'radf02', 'a8adf', '2022a']})
df['ind'] = df.groupby(['ref']).cumcount()
df_rotated = df.pivot(columns='ref', values='text', index = 'ind').reset_index(drop='true')
print(df_rotated)
Output:
ref a b
0 zz aa
1 yy bb