Slicing/selecting with multiple conditions with OR statement in a pandas dataframe
Question:
When I select by chaining different conditions with "AND", the selection works fine. When I select by chaining conditions with "OR" the selection throws an error.
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]],
columns=['a', 'b', 'c'])
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5
Now, df.loc[(df.a != 1) & (df.b < 5)]
works fine:
a b c
1 2 3 5
3 3 2 5
```none
but `df.loc[(df.a != 1) or (df.b < 5)]` raises error:
```none
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I would expect it to return the whole dataframe as all rows meet this condition.
Answers:
The important thing to note is that &
is not identical to and
; they are different things so the "or" equivalent to &
is |
Normally both &
and |
are bitwise logical operators rather than the python "logical" operators.
In pandas these operators are overloaded for Series
operation.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], columns=['a', 'b',
...: 'c'])
In [4]: df
Out[4]:
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5
In [5]: df.loc[(df.a != 1) & (df.b < 5)]
Out[5]:
a b c
1 2 3 5
3 3 2 5
In [6]: df.loc[(df.a != 1) | (df.b < 5)]
Out[6]:
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5
Pandas uses bitwise OR aka |
instead of or
to perform element-wise or
across multiple boolean Series objects. This is the canonical way if a boolean indexing is to be used.
However, another way to slice rows with multiple conditions is via query
which evaluates a boolean expression and here, or
may be used.
df1 = df.query("a !=1 or b < 5")
Note that in Python |
and &
precede comparison operators such as !=
and <
, so parentheses were necessary to create the boolean mask; however, inside query
, the operator precedence follows that of Python’s, where the comparison operators precede and
and or
, so parentheses are not necessary.
When I select by chaining different conditions with "AND", the selection works fine. When I select by chaining conditions with "OR" the selection throws an error.
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]],
columns=['a', 'b', 'c'])
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5
Now, df.loc[(df.a != 1) & (df.b < 5)]
works fine:
a b c
1 2 3 5
3 3 2 5
```none
but `df.loc[(df.a != 1) or (df.b < 5)]` raises error:
```none
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I would expect it to return the whole dataframe as all rows meet this condition.
The important thing to note is that &
is not identical to and
; they are different things so the "or" equivalent to &
is |
Normally both &
and |
are bitwise logical operators rather than the python "logical" operators.
In pandas these operators are overloaded for Series
operation.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], columns=['a', 'b',
...: 'c'])
In [4]: df
Out[4]:
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5
In [5]: df.loc[(df.a != 1) & (df.b < 5)]
Out[5]:
a b c
1 2 3 5
3 3 2 5
In [6]: df.loc[(df.a != 1) | (df.b < 5)]
Out[6]:
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5
Pandas uses bitwise OR aka |
instead of or
to perform element-wise or
across multiple boolean Series objects. This is the canonical way if a boolean indexing is to be used.
However, another way to slice rows with multiple conditions is via query
which evaluates a boolean expression and here, or
may be used.
df1 = df.query("a !=1 or b < 5")
Note that in Python |
and &
precede comparison operators such as !=
and <
, so parentheses were necessary to create the boolean mask; however, inside query
, the operator precedence follows that of Python’s, where the comparison operators precede and
and or
, so parentheses are not necessary.