Pandas or R – Merge Rows By Same Value in Column Over NaN values – Look at Example

Question:

I have a very specific dataset it looks something like this:

record_id event_id instrument repeat_inst
PI0005 v03_abc_1 NaN 1
PI0005 v03_abc_1 i_sensor NaN
PI0005 v03_abc_1 NaN NaN
PI0005 v02_abc_33 i_sensor NaN
PI0005 v02_abc_33 NaN NaN
PI0006 v02_abc_1 i_sensor 1
PI0006 v02_abc_1 NaN NaN

How do I make it look like this:

record_id event_id instrument repeat_inst
PI0005 v03_abc_1 i_sensor 1
PI0005 v02_abc_33 i_sensor NaN
PI0006 v02_abc_2 i_sensor 1

Where rows with the same record_id and event_id get merged together, where NaN values are replaced with the other value, and if both values are NaN, then NaN can be kept (like in the forth and fifth row in the original dataframe).

Assume that only one of the related cells have a value and all others have NaN.

This should apply to all columns of the data, there are thousands of columns and rows.

I tried using group by, but don’t know how to continue.

Asked By: fnaber

||

Answers:

With R

library(dplyr)
df1 %>%
   group_by(record_id, event_id) %>%
   summarise(across(everything(),  ~.x[!is.na(.x)][1]))
Answered By: akrun
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.