Concatenating values into column from multiple rows

Question:

I have a dataframe containing only duplicate "MainID" rows.
One MainID may have multiple secondary IDs (SecID).
I want to concatenate the values of SecID if there is a common MainID, joined by ‘:’ in SecID col.
What is the best way of achieving this?
Yes, I know this is not best practice, however it’s the structure the software wants.
Need to keep the df structure and values in rest of the df. They will always match the other duplicated row. Only SecID will be different.

Current:

data={'MainID':['NHFPL0580','NHFPL0580','NHFPL0582','NHFPL0582'],'SecID':['G12345','G67890','G11223','G34455'], 'Other':['A','A','B','B']}
df=pd.DataFrame(data)
print(df)
      MainID   SecID  Other
0  NHFPL0580  G12345  A
1  NHFPL0580  G67890  A
2  NHFPL0582  G11223  B
3  NHFPL0582  G34455  B

Intended Structure

MainID     SecID          Other 
NHFPL0580  G12345:G67890  A
NHFPL0582  G11223:G34455  B
Asked By: Adam

||

Answers:

Try:

df.groupby('MainID').apply(lambda x: ':'.join(x.SecID))

the above code returns a pd.Series, and you can convert it to a dataframe as @Guy suggested:

You need .reset_index(name=’SecID’) if you want it back as DataFrame

The solution to the edited question:

df = df.groupby(['MainID', 'Other']).apply(lambda x: ':'.join(x.SecID)).reset_index(name='SecID')

You can then change the column order

cols = df.columns.tolist()
df = df[[cols[i] for i in [0, 2, 1]]]
Answered By: Nuri Taş
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.