Numpy np.where condition with multiple columns

Question

I have a dataframe

import pandas as pd
import numpy as np

data = pd.DataFrame({"col1": [0, 1, 1, 1,1, 0],
                     "col2": [False, True, False, False, True, False]
                     })

data

I’m trying to create a column col3 where col1=1 and col2==True its 1 else 0

Using np.where:

data.assign(col3=np.where(data["col1"]==1 & data["col2"], 1, 0))

col1    col2    col3
0   0   False   1
1   1   True    1
2   1   False   0
3   1   False   0
4   1   True    1
5   0   False   1

For row 1: col1==0 & col2=False, but I’m getting col3 as 1.

What am I missing??

The desired output:


col1    col2    col3
0   0   False   0
1   1   True    1
2   1   False   0
3   1   False   0
4   1   True    1
5   0   False   0

Asked By: Ailurophile

||

Source

Answer 1

You are missing parentheses (& has higher precedence than ==):

data.assign(col3=np.where((data["col1"]==1) & data["col2"], 1, 0))

A way to avoid this is to use eq:

data.assign(col3=np.where(data["col1"].eq(1) & data["col2"], 1, 0))

You can also replace the numpy.where by astype:

data.assign(col3=((data["col1"]==1) & data["col2"]).astype(int))

Output:

   col1   col2  col3
0     0  False     0
1     1   True     1
2     1  False     0
3     1  False     0
4     1   True     1
5     0  False     0

Answered By: mozway

Numpy np.where condition with multiple columns

Question:

Answers: