Pandas/Numpy Multiple if statement with and/or operators

Question:

I have quite a complex If statement that I would like to add as a column in my pandas dataframe. In the past I’ve always used numpy.select for this type of problem, however I wouldn’t know how to achieve that with a multi-line if statement.

I was able to get this in Excel:

=IF(sum1=3,IF(AND(col1=col2,col2=col3),0,1),IF(sum1=2,IF(OR(col1=col2,col2=col3,col1=col3),0,1),IF(sum1=1,0,1)))

and write it in Python just as a regular multi-line ‘if statement’, just want to find out if there is a far cleaner way of presenting this.

if df['sum1'] == 3:
  if df['col1'] == df['col2'] and df['col2'] == df['col3']:
    df['verify_col'] = 0
  else:
    df['verify_col'] = 1
elif df['sum1'] == 2:
  if df['col1'] == df['col2'] or df['col2'] == df['col3'] or df['col1'] == df['col3']:
    df['verify_col'] = 0
  else:
    df['verify_col'] = 1
elif df['sum1'] == 1:
  df['verify_col'] = 0
else:
  df['verify_col'] = 1

Here’s some sample data:

df = pd.DataFrame({
    'col1': ['BMW', 'Mercedes Benz', 'Lamborghini', 'Ferrari', null],
    'col2': ['BMW', 'Mercedes Benz', null, null, 'Tesla'],
    'col3': ['BMW', 'Mercedes', 'Lamborghini', null, 'Tesla_'],
    'sum1': [3, 3, 2, 1, 2]
})

I want a column which has the following results:

'verify_col': [0, 1, 0, 0, 1]

It basically checks whether the columns match for those that have values in them and assigns a 1 or a 0 for each row. 1 meaning they are different, 0 meaning zero difference.

Asked By: Jered

||

Answers:

Use numpy.where with chain mask with | for bitwise OR – if no match any conditions is created 1:

m1 = (df['sum1'] == 3)
m2 = (df['col1'] == df['col2']) & (df['col2'] == df['col3'])
m3 = (df['sum1'] == 2)
m4 = (df['col1'] == df['col2']) | (df['col2'] == df['col3']) | (df['col1'] == df['col3'])
m5 = df['sum1'] == 1

df['verify_col'] = np.where((m1 & m2) | (m3 & m4) | m5, 0, 1)

If need None if no match any conditions:

df['verify_col'] = np.select([(m1 & m2) | (m3 & m4) | m5,
                              (m1 & ~m2) | (m3 & ~m4) | ~m5], 
                             [0,1], default=None)


print (df)
            col1           col2         col3  sum1  verify_col
0            BMW            BMW          BMW     3           0
1  Mercedes Benz  Mercedes Benz     Mercedes     3           1
2    Lamborghini            NaN  Lamborghini     2           0
3        Ferrari            NaN          NaN     1           0
4            NaN          Tesla       Tesla_     2           1
Answered By: jezrael
df['verify_col'] = (~(((df["col1"] == df["col2"]) | df["col1"].isna() | df["col2"].isna()) & ((df["col2"] == df["col3"]) | df["col2"].isna() | df["col3"].isna()))).astype(int)
Answered By: Raibek
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.