pandas multiple conditions based on multiple columns

Question:

I am trying to color points of a pandas dataframe depending on TWO conditions. Example:

IF value of col1 > a AND value of col2 - value of col3 < b THEN value of col4 = string
ELSE value of col4 = other string.

I have tried so many different ways now and everything I found online was only depending on one condition.

My example code always raises the Error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here’s the code. Tried several variations without success.

df = pd.DataFrame()

df['A'] = range(10)
df['B'] = range(11,21,1)
df['C'] = range(20,10,-1)

borderE = 3.
ex = 0.

#print df

df['color'] = np.where(all([df.A < borderE, df.B - df.C < ex]), 'r', 'b')

Btw: I understand, what it says but not how to handle it.

Asked By: Robert

||

Answers:

wrap the IF in a function and apply it:

def color(row):
    borderE = 3.
    ex = 0.
    if (row.A > borderE) and( row.B - row.C < ex) :
        return "somestring"
    else:
        return "otherstring"

df.loc[:, 'color'] = df.apply(color, axis = 1)

Yields:

  A   B   C        color
0  0  11  20  otherstring
1  1  12  19  otherstring
2  2  13  18  otherstring
3  3  14  17  otherstring
4  4  15  16   somestring
5  5  16  15  otherstring
6  6  17  14  otherstring
7  7  18  13  otherstring
8  8  19  12  otherstring
9  9  20  11  otherstring
Answered By: Sam

Selection criteria uses Boolean indexing:

df['color'] = np.where(((df.A < borderE) & ((df.B - df.C) < ex)), 'r', 'b')

>>> df
   A   B   C color
0  0  11  20     r
1  1  12  19     r
2  2  13  18     r
3  3  14  17     b
4  4  15  16     b
5  5  16  15     b
6  6  17  14     b
7  7  18  13     b
8  8  19  12     b
9  9  20  11     b
Answered By: Alexander

for me @Alexander ‘s solution didn’t work in my specific case, I had to use lists to pass the two conditions, and then transpose the output, My solution also work for this case:

df['color'] = np.where([df.A < borderE] and [df.B - df.C < ex], 'r', 'b').transpose()

Yields:

    A   B   C   color
0   0   11  20  r
1   1   12  19  r
2   2   13  18  r
3   3   14  17  b
4   4   15  16  b
5   5   16  15  b
6   6   17  14  b
7   7   18  13  b
8   8   19  12  b
9   9   20  11  b
Answered By: Zakaria Zaza

The error in the question occurred because OP used all() function instead of the bitwise-& operator to chain multiple comparisons together.1

Another way to chain multiple comparisons is to evaluate an expression via eval() method. The following checks if A value is less than borderE AND (B - C) value is less than ex for each row.

df.eval('A < @borderE and B - C < @ex')

Using its outcome as a numpy.where() condition, you can assign values by

df['color'] = np.where(df.eval('A < @borderE and B - C < @ex'), 'r', 'b')

If you installed numexpr (pip install numexpr) as recommended in the pandas documentation, this method should perform as well (and better if you have a lot of conditions to reduce) as chaining via &. The advantage is that (i) it’s much more readable (imo) and (ii) you don’t need to worry about brackets (), and/& etc. anymore because the order of precedence inside the string expression is the same as that in Python.

1: all() by design doesn’t work on pandas Series (or numpy arrays) because the pandas developers felt that it’s ambiguous when to return True: if any element is True or if all elements are True; in fact, it’s not clear if a single bool should be returned or a boolean Series should be returned when you call all(). You can explicitly tell it that you want a single bool returned by calling items(), all() etc. on the Series object (which the error is saying) but if you do so (e.g. all([(df.A < borderE).all(), (df.B - df.C < ex).all()])), the conditions will be aggregated into a single boolean value, not the desired boolean vector.