How to populate NaN values based on conditions from two other columns using Pandas?

Question:

I have a dataframe that looks something like this:

ID hiqual Wave
1 1.0 g
1 NaN i
1 NaN k
2 1.0 g
2 NaN i
2 NaN k
3 1.0 g
3 NaN i
4 5.0 g
4 NaN i

This is a long format dataframe and I have my hiqual variable for my first measurement wave (g). I would like to populate the NaN values for the subsequent measurement waves (i and k) as the same value give in wave g for each ID.

I tried using fillna() but I am not sure how to provide the two conditions of ID and Wave and how to populate based on that. I would be grateful for any help/suggestions on this?

Asked By: newbie_python

||

Answers:

If you dataframe is already ordered by ID and wave columns, you can simply fill forward values:

>>> df.sort_values(['ID', 'Wave']).ffill()
   ID  hiqual Wave
0   1     1.0    g
1   1     1.0    i
2   1     1.0    k
3   2     1.0    g
4   2     1.0    i
5   2     1.0    k
6   3     1.0    g
7   3     1.0    i
8   4     5.0    g
9   4     5.0    i

You can also use explicitly g values:

g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
print(df)
print(g_vals)

# Output
   ID  hiqual Wave
0   1     1.0    g
1   1     1.0    i
2   1     1.0    k
3   2     1.0    g
4   2     1.0    i
5   2     1.0    k
6   3     1.0    g
7   3     1.0    i
8   4     5.0    g
9   4     5.0    i

# g_vals
ID
1    1.0
2    1.0
3    1.0
4    5.0
Name: hiqual, dtype: float64
Answered By: Corralien

The exact expected output is unclear, but think you might want:

m = df['hiqual'].isna()

df.loc[m, 'hiqual'] = df['Wave'].mask(m).ffill()
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.