Unpivot multiple columns with same name in pandas dataframe
Question:
I have the following dataframe:
pp b pp b
5 0.001464 6 0.001853
5 0.001459 6 0.001843
Is there a way to unpivot columns with the same name into multiple rows?
This is the required output:
pp b
5 0.001464
5 0.001459
6 0.001853
6 0.001843
Answers:
This is possible using numpy
:
res = pd.DataFrame({'pp': df['pp'].values.T.ravel(),
'b': df['b'].values.T.ravel()})
print(res)
b pp
0 0.001464 5
1 0.001459 5
2 0.001853 6
3 0.001843 6
Or without referencing specific columns explicitly:
res = pd.DataFrame({i: df[i].values.T.ravel() for i in set(df.columns)})
Let’s use melt, cumcount and unstack:
dm = df.melt()
dm.set_index(['variable',dm.groupby('variable').cumcount()])
.sort_index()['value'].unstack(0)
Output:
variable b pp
0 0.001464 5.0
1 0.001459 5.0
2 0.001853 6.0
3 0.001843 6.0
Try groupby
with axis=1
df.groupby(df.columns.values, axis=1).agg(lambda x: x.values.tolist()).sum().apply(pd.Series).T.sort_values('pp')
Out[320]:
b pp
0 0.001464 5.0
2 0.001459 5.0
1 0.001853 6.0
3 0.001843 6.0
A fun way with wide_to_long
s=pd.Series(df.columns)
df.columns=df.columns+s.groupby(s).cumcount().astype(str)
pd.wide_to_long(df.reset_index(),stubnames=['pp','b'],i='index',j='drop',suffix='d+')
Out[342]:
pp b
index drop
0 0 5 0.001464
1 0 5 0.001459
0 1 6 0.001853
1 1 6 0.001843
I’m a little bit surprise that nobody has mentioned so far the use of pd.concat… Take a look below:
df1 = pd.DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8]})
df1
Col1 Col2
0 1 5
1 2 6
2 3 7
3 4 8
Now if you make:
df2 = pd.concat([df1,df1])
you get:
Col1 Col2
0 1 5
1 2 6
2 3 7
3 4 8
0 1 5
1 2 6
2 3 7
3 4 8
This is what you wanted, isn’t?
if you know the number of repetitions in ahead, it’s very easy with using numpy:
import numpy as np
import pandas as pd
repetitions=5
rows=2
original_columns=list('ab')
df=pd.DataFrame(np.random.randint(0,10,[rows,len(original_columns)*repetitions]), columns=original_columns*repetitions)
display(df)
a b a b a b a b a b
0 6 4 7 5 2 5 3 1 4 3
1 1 5 4 9 6 2 9 5 3 6
# now the interesting part:
df=pd.concat(np.hsplit(df, repetitions))
display(df)
a b
0 6 4
1 1 5
0 7 5
1 4 9
0 2 5
1 6 2
0 3 1
1 9 5
0 4 3
1 3 6
One option is with pivot_longer from pyjanitor – in this case we take advantage of the fact that pp
is followed by b
– we can safely pair them and reshape into two columns.
# pip install pyjanitor
import pandas as pd
import janitor
arr = ['pp', 'b']
df.pivot_longer(index = None, names_to = arr, names_pattern = arr)
pp b
0 5 0.001464
1 5 0.001459
2 6 0.001853
3 6 0.001843
I have the following dataframe:
pp b pp b
5 0.001464 6 0.001853
5 0.001459 6 0.001843
Is there a way to unpivot columns with the same name into multiple rows?
This is the required output:
pp b
5 0.001464
5 0.001459
6 0.001853
6 0.001843
This is possible using numpy
:
res = pd.DataFrame({'pp': df['pp'].values.T.ravel(),
'b': df['b'].values.T.ravel()})
print(res)
b pp
0 0.001464 5
1 0.001459 5
2 0.001853 6
3 0.001843 6
Or without referencing specific columns explicitly:
res = pd.DataFrame({i: df[i].values.T.ravel() for i in set(df.columns)})
Let’s use melt, cumcount and unstack:
dm = df.melt()
dm.set_index(['variable',dm.groupby('variable').cumcount()])
.sort_index()['value'].unstack(0)
Output:
variable b pp
0 0.001464 5.0
1 0.001459 5.0
2 0.001853 6.0
3 0.001843 6.0
Try groupby
with axis=1
df.groupby(df.columns.values, axis=1).agg(lambda x: x.values.tolist()).sum().apply(pd.Series).T.sort_values('pp')
Out[320]:
b pp
0 0.001464 5.0
2 0.001459 5.0
1 0.001853 6.0
3 0.001843 6.0
A fun way with wide_to_long
s=pd.Series(df.columns)
df.columns=df.columns+s.groupby(s).cumcount().astype(str)
pd.wide_to_long(df.reset_index(),stubnames=['pp','b'],i='index',j='drop',suffix='d+')
Out[342]:
pp b
index drop
0 0 5 0.001464
1 0 5 0.001459
0 1 6 0.001853
1 1 6 0.001843
I’m a little bit surprise that nobody has mentioned so far the use of pd.concat… Take a look below:
df1 = pd.DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8]})
df1
Col1 Col2
0 1 5
1 2 6
2 3 7
3 4 8
Now if you make:
df2 = pd.concat([df1,df1])
you get:
Col1 Col2
0 1 5
1 2 6
2 3 7
3 4 8
0 1 5
1 2 6
2 3 7
3 4 8
This is what you wanted, isn’t?
if you know the number of repetitions in ahead, it’s very easy with using numpy:
import numpy as np
import pandas as pd
repetitions=5
rows=2
original_columns=list('ab')
df=pd.DataFrame(np.random.randint(0,10,[rows,len(original_columns)*repetitions]), columns=original_columns*repetitions)
display(df)
a b a b a b a b a b
0 6 4 7 5 2 5 3 1 4 3
1 1 5 4 9 6 2 9 5 3 6
# now the interesting part:
df=pd.concat(np.hsplit(df, repetitions))
display(df)
a b
0 6 4
1 1 5
0 7 5
1 4 9
0 2 5
1 6 2
0 3 1
1 9 5
0 4 3
1 3 6
One option is with pivot_longer from pyjanitor – in this case we take advantage of the fact that pp
is followed by b
– we can safely pair them and reshape into two columns.
# pip install pyjanitor
import pandas as pd
import janitor
arr = ['pp', 'b']
df.pivot_longer(index = None, names_to = arr, names_pattern = arr)
pp b
0 5 0.001464
1 5 0.001459
2 6 0.001853
3 6 0.001843