Merge rows with pandas and append the different columns
Question:
I have the following dataframe :
CONTACT_ID
COMPANY_ID
FIRST_NAME
SURNAME
EMAIL
COMPANY_NAME
1
8
Martin
GERARD
[email protected]
BIG COMPANY
1
87
Martin
GERARD
[email protected]
LITTLE COMPANY
30
12
Jean
DUPOND
[email protected]
MEGA COMPANY
I’d like to use pandas to convert it like so :
CONTACT_ID
FIRST_NAME
SURNAME
EMAIL
COMPANY_ID_1
COMPANY_NAME_1
COMPANY_ID_2
COMPANY_NAME_2
1
Martin
GERARD
[email protected]
8
BIGCOMPANY
87
LITTLECOMPANY
30
Jean
DUPOND
[email protected]
12
MEGA COMPANY
One contact can even have more than two companies.
Answers:
Create list of columns used for create groups, then converte them to index by DataFrame.set_index
and counter by GroupBy.cumcount
, reshape by DataFrame.unstack
with DataFrame.sort_index
, flatten MultiIndex
and last convert levels from list to columns:
cols = ['CONTACT_ID','FIRST_NAME','SURNAME','EMAIL']
df = (df.set_index([*cols, df.groupby(cols).cumcount().add(1)])
.unstack(fill_value='')
.sort_index(axis=1, level=1,sort_remaining=False))
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
CONTACT_ID FIRST_NAME SURNAME EMAIL COMPANY_ID_1
0 1 Martin GERARD [email protected] 8
1 30 Jean DUPOND [email protected] 12
COMPANY_NAME_1 COMPANY_ID_2 COMPANY_NAME_2
0 BIG COMPANY 87 LITTLE COMPANY
1 MEGA COMPANY
I have the following dataframe :
CONTACT_ID COMPANY_ID FIRST_NAME SURNAME COMPANY_NAME 1 8 Martin GERARD [email protected] BIG COMPANY 1 87 Martin GERARD [email protected] LITTLE COMPANY 30 12 Jean DUPOND [email protected] MEGA COMPANY
I’d like to use pandas to convert it like so :
CONTACT_ID FIRST_NAME SURNAME COMPANY_ID_1 COMPANY_NAME_1 COMPANY_ID_2 COMPANY_NAME_2 1 Martin GERARD [email protected] 8 BIGCOMPANY 87 LITTLECOMPANY 30 Jean DUPOND [email protected] 12 MEGA COMPANY
One contact can even have more than two companies.
Create list of columns used for create groups, then converte them to index by DataFrame.set_index
and counter by GroupBy.cumcount
, reshape by DataFrame.unstack
with DataFrame.sort_index
, flatten MultiIndex
and last convert levels from list to columns:
cols = ['CONTACT_ID','FIRST_NAME','SURNAME','EMAIL']
df = (df.set_index([*cols, df.groupby(cols).cumcount().add(1)])
.unstack(fill_value='')
.sort_index(axis=1, level=1,sort_remaining=False))
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
CONTACT_ID FIRST_NAME SURNAME EMAIL COMPANY_ID_1
0 1 Martin GERARD [email protected] 8
1 30 Jean DUPOND [email protected] 12
COMPANY_NAME_1 COMPANY_ID_2 COMPANY_NAME_2
0 BIG COMPANY 87 LITTLE COMPANY
1 MEGA COMPANY