Merge rows based on same column value (float type)
Question:
I have a dataset that looks like the following:
id name phone diagnosis
0 1 archie 12345 healthy
1 2 betty 23456 dead
2 3 clara 34567 NaN
3 3 clara 34567 kidney
4 4 diana 45678 cancer
I want to merge duplicated rows and have a table that looks like this:
id name phone diagnosis
0 1 archie 12345 healthy
1 2 betty 23456 dead
2 3 clara 34567 NaN, kidney
3 4 diana 45678 cancer
In short I want the entries in the diagnosis column put together so I can have an overview. I have tried running the following but it throws out an error, stating that a string was expected but a float was found.
data = data.groupby(['id','name','phone'])['diagnosis'].apply(', '.join).reset_index()
Anyone have any ideas how I can merge the rows?
Answers:
It is because of NaN
values. And you can’t really concatenate strings with NaN
as expected. One alternative way is to fill nans with string ‘NaN’:
data.fillna('NaN', inplace=True)
data.groupby(['id', 'name', 'phone']).diagnosis.apply(', '.join).reset_index()
I have a dataset that looks like the following:
id name phone diagnosis
0 1 archie 12345 healthy
1 2 betty 23456 dead
2 3 clara 34567 NaN
3 3 clara 34567 kidney
4 4 diana 45678 cancer
I want to merge duplicated rows and have a table that looks like this:
id name phone diagnosis
0 1 archie 12345 healthy
1 2 betty 23456 dead
2 3 clara 34567 NaN, kidney
3 4 diana 45678 cancer
In short I want the entries in the diagnosis column put together so I can have an overview. I have tried running the following but it throws out an error, stating that a string was expected but a float was found.
data = data.groupby(['id','name','phone'])['diagnosis'].apply(', '.join).reset_index()
Anyone have any ideas how I can merge the rows?
It is because of NaN
values. And you can’t really concatenate strings with NaN
as expected. One alternative way is to fill nans with string ‘NaN’:
data.fillna('NaN', inplace=True)
data.groupby(['id', 'name', 'phone']).diagnosis.apply(', '.join).reset_index()