Updating column value based on nan value of other column

Question:

I have this simple function with 2 columns. What I’m trying to do is to check what group has a number of nan and change it to a new desired value.
Here’s a code snippet:

def twod_array():
    data = {"group": [-1, 0, 1, 2, 3],
            'numbers': [[2], [14, 15], [16, 17], [19, 20, 21], [np.nan]],
            }
    df = pd.DataFrame(data=data)
    new_group_number = 100
    df.loc[4, "group"] = new_group_number
    return df

Before:
This is how the data looks like, you can assume numbers are sorted.

   group       numbers
0     -1           [2]
1      0      [14, 15]
2      1      [16, 17]
3      2  [19, 20, 21]
4      3         [nan]

In my example I know where nan and since it was at position 4, I was able to use loc to change it to a 100, like this:

   group       numbers
0     -1           [2]
1      0      [14, 15]
2      1      [16, 17]
3      2  [19, 20, 21]
4    100         [nan]

What if I don’t know where the nan is? How can I know which group to update? All that comes to my mind is nested for loop which I would rather avoid… Any suggestions here?

Asked By: stock_exchange

||

Answers:

You could replace

df.loc[4, "group"] = new_group_number

with

idx = df.numbers.apply(lambda l: any(pd.isna(e) for e in l))
df.loc[idx, 'group'] = new_group_number
Answered By: Michael Hodel

If np.nan could occur more than once then you can use:

df.loc[df['numbers'].apply(pd.isna).apply(any), 'group'] = 100

or

df.loc[df['numbers'].apply(lambda x: pd.isna(x).any()), 'group'] = 100

Or if you think there is only one np.nan then you could use .isin :

df.loc[df['numbers'].isin([[np.nan]]), 'group'] = 100
Answered By: SomeDude