Merge rows based on same column value (float type)

Question:

I have a dataset that looks like the following:

   id    name  phone  diagnosis
0   1  archie  12345    healthy
1   2   betty  23456       dead
2   3   clara  34567        NaN
3   3   clara  34567     kidney
4   4   diana  45678     cancer

I want to merge duplicated rows and have a table that looks like this:

   id    name  phone    diagnosis
0   1  archie  12345      healthy
1   2   betty  23456         dead
2   3   clara  34567  NaN, kidney
3   4   diana  45678       cancer

In short I want the entries in the diagnosis column put together so I can have an overview. I have tried running the following but it throws out an error, stating that a string was expected but a float was found.

data = data.groupby(['id','name','phone'])['diagnosis'].apply(', '.join).reset_index()

Anyone have any ideas how I can merge the rows?

Asked By: luthien aerendell

||

Answers:

It is because of NaN values. And you can’t really concatenate strings with NaN as expected. One alternative way is to fill nans with string ‘NaN’:

data.fillna('NaN', inplace=True)
data.groupby(['id', 'name', 'phone']).diagnosis.apply(', '.join).reset_index()
Answered By: Nuri Taş
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.