concatenate features with identical ID in DataFrame
Question:
I have tables that have many features, and these features can have the same ID.
How can I check each for ID, then concatenate identical ID features in one row?
For example, here’s an example of a simple table stored in a dataframe several features and one ID and the output will concatenate all features that have same ID and put them as new features and for IDs that don’t have other features will be zero value as in this table:
Answers:
join
is what you need. And specify how='outer'
if you dont want to lose any of row.
df1.set_index('ID').join(df2.set_index('ID'), how='outer')
IIUC, You can use:
dfx=df.groupby('ID').agg(list)
max_list_lenght=len(dfx.max()[0])
final=pd.DataFrame(dfx.apply(lambda x: [x[i][j] if len(x[i]) > j else 0 for j in range(0,max_list_lenght) for i in dfx.columns],axis=1).tolist(), index=dfx.index)
final.columns=['dat' + str(i) for i in range(1,len(final.columns) + 1)]
Output:
dat1 dat2 dat3 dat4 dat5 dat6
ID
1 9 3 6 5 7 7
2 5 5 5 6 5 5
3 3 0 5 0 0 0
I have tables that have many features, and these features can have the same ID.
How can I check each for ID, then concatenate identical ID features in one row?
For example, here’s an example of a simple table stored in a dataframe several features and one ID and the output will concatenate all features that have same ID and put them as new features and for IDs that don’t have other features will be zero value as in this table:
join
is what you need. And specify how='outer'
if you dont want to lose any of row.
df1.set_index('ID').join(df2.set_index('ID'), how='outer')
IIUC, You can use:
dfx=df.groupby('ID').agg(list)
max_list_lenght=len(dfx.max()[0])
final=pd.DataFrame(dfx.apply(lambda x: [x[i][j] if len(x[i]) > j else 0 for j in range(0,max_list_lenght) for i in dfx.columns],axis=1).tolist(), index=dfx.index)
final.columns=['dat' + str(i) for i in range(1,len(final.columns) + 1)]
Output:
dat1 dat2 dat3 dat4 dat5 dat6
ID
1 9 3 6 5 7 7
2 5 5 5 6 5 5
3 3 0 5 0 0 0