Concatenate columns skipping pasted rows and columns

Question:

I expect to describe well want I need. I have a data frame with the same columns name and another column that works as an index. The data frame looks as follows:

df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],'X':[1,2,3,4,5,2,3,4,1,3,4,5],'Y':[1,2,3,4,5,2,3,4,5,4,3,2]})

df
Out[21]: 
    ID  X  Y
0    1  1  1
1    1  2  2
2    1  3  3
3    1  4  4
4    1  5  5
5    2  2  2
6    2  3  3
7    2  4  4
8    3  1  5
9    3  3  4
10   3  4  3
11   3  5  2

My intention is to copy X as an index or one column (it doesn’t matter) and append Y columns from each ‘ID’ in the following way:

enter image description here

Asked By: Jonathan Pacheco

||

Answers:

You can try

out = pd.concat([group.rename(columns={'Y': f'Y{name}'}) for name, group in df.groupby('ID')])
out.columns = out.columns.str.replace(r'd+$', '', regex=True)
print(out)

    ID  X    Y    Y    Y
0    1  1  1.0  NaN  NaN
1    1  2  2.0  NaN  NaN
2    1  3  3.0  NaN  NaN
3    1  4  4.0  NaN  NaN
4    1  5  5.0  NaN  NaN
5    2  2  NaN  2.0  NaN
6    2  3  NaN  3.0  NaN
7    2  4  NaN  4.0  NaN
8    3  1  NaN  NaN  5.0
9    3  3  NaN  NaN  4.0
10   3  4  NaN  NaN  3.0
11   3  5  NaN  NaN  2.0
Answered By: Ynjxsjmh

Here’s another way to do it:

df_org = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],'X':[1,2,3,4,5,2,3,4,1,3,4,5]})
df = df_org.copy()
for i in set(df_org['ID']):
    df1 = df_org[df_org['ID']==i]
    col = 'Y'+str(i)
    df1.columns = ['ID', col]
    df = pd.concat([ df, df1[[col]] ], axis=1)
df.columns = df.columns.str.replace(r'd+$', '', regex=True)
print(df)

Output:

    ID  X    Y    Y    Y
0    1  1  1.0  NaN  NaN
1    1  2  2.0  NaN  NaN
2    1  3  3.0  NaN  NaN
3    1  4  4.0  NaN  NaN
4    1  5  5.0  NaN  NaN
5    2  2  NaN  2.0  NaN
6    2  3  NaN  3.0  NaN
7    2  4  NaN  4.0  NaN
8    3  1  NaN  NaN  1.0
9    3  3  NaN  NaN  3.0
10   3  4  NaN  NaN  4.0
11   3  5  NaN  NaN  5.0
Answered By: perpetualstudent

Another solution could be as follow.

  • Get unique values for column ID (stored in array s).
  • Use np.transpose to repeat column ID n times (n == len(s)) and evaluate the array’s matches with s.
  • Use np.where to replace True with values from df.Y and False with NaN.
  • Finally, drop the orignal df.Y and rename the new columns as required.
import pandas as pd
import numpy as np

df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],
                   'X':[1,2,3,4,5,2,3,4,1,3,4,5],
                   'Y':[1,2,3,4,5,2,3,4,5,4,3,2]})

s = df.ID.unique()

df[s] = np.where((np.transpose([df.ID]*len(s))==s),
                  np.transpose([df.Y]*len(s)),
                  np.nan)

df.drop('Y', axis=1, inplace=True)
df.rename(columns={k:'Y' for k in s}, inplace=True)
print(df)

    ID  X    Y    Y    Y
0    1  1  1.0  NaN  NaN
1    1  2  2.0  NaN  NaN
2    1  3  3.0  NaN  NaN
3    1  4  4.0  NaN  NaN
4    1  5  5.0  NaN  NaN
5    2  2  NaN  2.0  NaN
6    2  3  NaN  3.0  NaN
7    2  4  NaN  4.0  NaN
8    3  1  NaN  NaN  5.0
9    3  3  NaN  NaN  4.0
10   3  4  NaN  NaN  3.0
11   3  5  NaN  NaN  2.0

If performance is an issue, this method should be faster than this answer, especially when the number of unique values for ID increases.

Answered By: ouroboros1