Concatenate columns skipping pasted rows and columns

Question

I expect to describe well want I need. I have a data frame with the same columns name and another column that works as an index. The data frame looks as follows:

df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],'X':[1,2,3,4,5,2,3,4,1,3,4,5],'Y':[1,2,3,4,5,2,3,4,5,4,3,2]})

df
Out[21]: 
    ID  X  Y
0    1  1  1
1    1  2  2
2    1  3  3
3    1  4  4
4    1  5  5
5    2  2  2
6    2  3  3
7    2  4  4
8    3  1  5
9    3  3  4
10   3  4  3
11   3  5  2

My intention is to copy X as an index or one column (it doesn’t matter) and append Y columns from each ‘ID’ in the following way:

Asked By: Jonathan Pacheco

||

Source

Answer 1

You can try

out = pd.concat([group.rename(columns={'Y': f'Y{name}'}) for name, group in df.groupby('ID')])
out.columns = out.columns.str.replace(r'd+$', '', regex=True)

print(out)

    ID  X    Y    Y    Y
0    1  1  1.0  NaN  NaN
1    1  2  2.0  NaN  NaN
2    1  3  3.0  NaN  NaN
3    1  4  4.0  NaN  NaN
4    1  5  5.0  NaN  NaN
5    2  2  NaN  2.0  NaN
6    2  3  NaN  3.0  NaN
7    2  4  NaN  4.0  NaN
8    3  1  NaN  NaN  5.0
9    3  3  NaN  NaN  4.0
10   3  4  NaN  NaN  3.0
11   3  5  NaN  NaN  2.0

Answered By: Ynjxsjmh

Answer 2

Here’s another way to do it:

df_org = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],'X':[1,2,3,4,5,2,3,4,1,3,4,5]})
df = df_org.copy()
for i in set(df_org['ID']):
    df1 = df_org[df_org['ID']==i]
    col = 'Y'+str(i)
    df1.columns = ['ID', col]
    df = pd.concat([ df, df1[[col]] ], axis=1)
df.columns = df.columns.str.replace(r'd+$', '', regex=True)
print(df)

Output:

    ID  X    Y    Y    Y
0    1  1  1.0  NaN  NaN
1    1  2  2.0  NaN  NaN
2    1  3  3.0  NaN  NaN
3    1  4  4.0  NaN  NaN
4    1  5  5.0  NaN  NaN
5    2  2  NaN  2.0  NaN
6    2  3  NaN  3.0  NaN
7    2  4  NaN  4.0  NaN
8    3  1  NaN  NaN  1.0
9    3  3  NaN  NaN  3.0
10   3  4  NaN  NaN  4.0
11   3  5  NaN  NaN  5.0

Answered By: perpetualstudent

Answer 3

Another solution could be as follow.

Get unique values for column ID (stored in array s).
Use np.transpose to repeat column ID n times (n == len(s)) and evaluate the array’s matches with s.
Use np.where to replace True with values from df.Y and False with NaN.
Finally, drop the orignal df.Y and rename the new columns as required.

import pandas as pd
import numpy as np

df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],
                   'X':[1,2,3,4,5,2,3,4,1,3,4,5],
                   'Y':[1,2,3,4,5,2,3,4,5,4,3,2]})

s = df.ID.unique()

df[s] = np.where((np.transpose([df.ID]*len(s))==s),
                  np.transpose([df.Y]*len(s)),
                  np.nan)

df.drop('Y', axis=1, inplace=True)
df.rename(columns={k:'Y' for k in s}, inplace=True)
print(df)

    ID  X    Y    Y    Y
0    1  1  1.0  NaN  NaN
1    1  2  2.0  NaN  NaN
2    1  3  3.0  NaN  NaN
3    1  4  4.0  NaN  NaN
4    1  5  5.0  NaN  NaN
5    2  2  NaN  2.0  NaN
6    2  3  NaN  3.0  NaN
7    2  4  NaN  4.0  NaN
8    3  1  NaN  NaN  5.0
9    3  3  NaN  NaN  4.0
10   3  4  NaN  NaN  3.0
11   3  5  NaN  NaN  2.0

If performance is an issue, this method should be faster than this answer, especially when the number of unique values for ID increases.

Answered By: ouroboros1

Concatenate columns skipping pasted rows and columns

Question:

Answers: