Slicing/selecting with multiple conditions with OR statement in a pandas dataframe

Question:

When I select by chaining different conditions with "AND", the selection works fine. When I select by chaining conditions with "OR" the selection throws an error.

import pandas as pd
import numpy as np
df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], 
     columns=['a', 'b', 'c'])

   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5

Now, df.loc[(df.a != 1) & (df.b < 5)] works fine:

   a  b  c
1  2  3  5
3  3  2  5
```none
but `df.loc[(df.a != 1) or (df.b < 5)]` raises error:
```none
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 731, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I would expect it to return the whole dataframe as all rows meet this condition.

Asked By: jtorca

||

Answers:

The important thing to note is that & is not identical to and; they are different things so the "or" equivalent to & is |

Normally both & and | are bitwise logical operators rather than the python "logical" operators.

In pandas these operators are overloaded for Series operation.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], columns=['a', 'b',
   ...:  'c'])

In [4]: df
Out[4]:
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5

In [5]: df.loc[(df.a != 1) & (df.b < 5)]
Out[5]:
   a  b  c
1  2  3  5
3  3  2  5

In [6]: df.loc[(df.a != 1) | (df.b < 5)]
Out[6]:
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5
Answered By: Steve Barnes

Pandas uses bitwise OR aka | instead of or to perform element-wise or across multiple boolean Series objects. This is the canonical way if a boolean indexing is to be used.

However, another way to slice rows with multiple conditions is via query which evaluates a boolean expression and here, or may be used.

df1 = df.query("a !=1 or b < 5")

Note that in Python | and & precede comparison operators such as != and <, so parentheses were necessary to create the boolean mask; however, inside query, the operator precedence follows that of Python’s, where the comparison operators precede and and or, so parentheses are not necessary.

Answered By: cottontail