Pandas or R – Merge Rows By Same Value in Column Over NaN values – Look at Example

Question

I have a very specific dataset it looks something like this:

record_id	event_id	instrument	repeat_inst
PI0005	v03_abc_1	`NaN`	1
PI0005	v03_abc_1	i_sensor	`NaN`
PI0005	v03_abc_1	`NaN`	`NaN`
PI0005	v02_abc_33	i_sensor	`NaN`
PI0005	v02_abc_33	`NaN`	`NaN`
PI0006	v02_abc_1	i_sensor	1
PI0006	v02_abc_1	`NaN`	`NaN`

How do I make it look like this:

record_id	event_id	instrument	repeat_inst
PI0005	v03_abc_1	i_sensor	1
PI0005	v02_abc_33	i_sensor	`NaN`
PI0006	v02_abc_2	i_sensor	1

Where rows with the same record_id and event_id get merged together, where NaN values are replaced with the other value, and if both values are NaN, then NaN can be kept (like in the forth and fifth row in the original dataframe).

Assume that only one of the related cells have a value and all others have NaN.

This should apply to all columns of the data, there are thousands of columns and rows.

I tried using group by, but don’t know how to continue.

Asked By: fnaber

||

Source

Answer 1

With R

library(dplyr)
df1 %>%
   group_by(record_id, event_id) %>%
   summarise(across(everything(),  ~.x[!is.na(.x)][1]))

Answered By: akrun

Pandas or R – Merge Rows By Same Value in Column Over NaN values – Look at Example

Question:

Answers: