Python: where clause with two conditions

Question

I have a DataFrame as follows:

data = [[99330,12,122],
   [1123,1230,1287],
   [123,101,812739],
   [1143,12301230,252]]
df1 = pd.DataFrame(data, index=['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04'], 
              columns=['col_A', 'col_B', 'col_C'])
df1.index = pd.to_datetime(df1.index)
for col in df1.columns:
    df1[col+'_mean'] = df1[col].rolling(1).mean().shift()
    df1[col+'_std'] = df1[col].rolling(1).std().shift()
    df1[col+'_upper'] = df1[col+'_mean'] + df1[col+'_std']
    df1[col+'_lower'] = df1[col+'_mean'] - df1[col+'_std']
    df1[col+'_outlier'] = np.where(df1[col]>df1[col+'_upper'] or df1[col]<df1[col+'_lower'], 1, 0)

However, the last line gives an error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I want to get a column col+'_outlier' which displays 1 if df1[col]>df1[col+'_upper'] or if df1[col]<df1[col+'_lower']; and display 0 otherwise.

What’s the proper way to write this where clause with two conditions?

Asked By: MathMan 99

||

Source

Answer 1

Have a look at the operater precedence table in the official documentation. Highest precedence from top to bottom.
You need to wrap your condition in parenthesis and use pipe | instead of or.

df1[col+'_outlier'] = np.where( (df1[col]>df1[col+'_upper']) | (df1[col]<df1[col+'_lower']) , 1, 0)

Answered By: Rabinzel

Python: where clause with two conditions

Question:

Answers: