# Logical operators for Boolean indexing in Pandas

## Question:

I’m working with a Boolean index in Pandas.

The question is why the statement:

```
a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]
```

works fine whereas

```
a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]
```

exits with error?

Example:

```
a = pd.DataFrame({'x':[1,1],'y':[10,20]})
In: a[(a['x']==1)&(a['y']==10)]
Out: x y
0 1 10
In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```

## Answers:

When you say

```
(a['x']==1) and (a['y']==10)
```

You are implicitly asking Python to convert `(a['x']==1)`

and `(a['y']==10)`

to Boolean values.

NumPy arrays (of length greater than 1) and Pandas objects such as Series do not have a Boolean value — in other words, they raise

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

when used as a Boolean value. That’s because it’s unclear when it should be True or False. Some users might assume they are True if they have non-zero length, like a Python list. Others might desire for it to be True only if **all** its elements are True. Others might want it to be True if **any** of its elements are True.

Because there are so many conflicting expectations, the designers of NumPy and Pandas refuse to guess, and instead raise a ValueError.

Instead, you must be explicit, by calling the `empty()`

, `all()`

or `any()`

method to indicate which behavior you desire.

In this case, however, it looks like you do not want Boolean evaluation, you want **element-wise** logical-and. That is what the `&`

binary operator performs:

```
(a['x']==1) & (a['y']==10)
```

returns a boolean array.

By the way, as alexpmil notes,

the parentheses are mandatory since `&`

has a higher operator precedence than `==`

.

Without the parentheses, `a['x']==1 & a['y']==10`

would be evaluated as `a['x'] == (1 & a['y']) == 10`

which would in turn be equivalent to the chained comparison `(a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10)`

. That is an expression of the form `Series and Series`

.

The use of `and`

with two Series would again trigger the same `ValueError`

as above. That’s why the parentheses are mandatory.

# TLDR; _{Logical Operators in Pandas are &, | and ~, and parentheses (...) is important!}

Python’s `and`

, `or`

and `not`

logical operators are designed to work with scalars. So Pandas had to do one better and override the bitwise operators to achieve *vectorized* (element-wise) version of this functionality.

So the following in python (`exp1`

and `exp2`

are expressions which evaluate to a boolean result)…

```
exp1 and exp2 # Logical AND
exp1 or exp2 # Logical OR
not exp1 # Logical NOT
```

…will translate to…

```
exp1 & exp2 # Element-wise logical AND
exp1 | exp2 # Element-wise logical OR
~exp1 # Element-wise logical NOT
```

for pandas.

If in the process of performing logical operation you get a `ValueError`

, then you need to use parentheses for grouping:

```
(exp1) op (exp2)
```

For example,

```
(df['col1'] == x) & (df['col2'] == y)
```

And so on.

**Boolean Indexing**: A common operation is to compute boolean masks through logical conditions to filter the data. Pandas provides **three** operators: `&`

for logical AND, `|`

for logical OR, and `~`

for logical NOT.

Consider the following setup:

```
np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC'))
df
A B C
0 5 0 3
1 3 7 9
2 3 5 2
3 4 7 6
4 8 8 1
```

**Logical AND**

For `df`

above, say you’d like to return all rows where A < 5 and B > 5. This is done by computing masks for each condition separately, and ANDing them.

**Overloaded Bitwise & Operator**

Before continuing, please take note of this particular excerpt of the docs, which state

Another common operation is the use of boolean vectors to filter the

data. The operators are:`|`

for`or`

,`&`

for`and`

, and`~`

for`not`

.These, since by default Python will

must be grouped by using parentheses

evaluate an expression such as`df.A > 2 & df.B < 3`

as`df.A > (2 &`

, while the desired evaluation order is

df.B) < 3`(df.A > 2) & (df.B <`

.

3)

So, with this in mind, element wise logical AND can be implemented with the bitwise operator `&`

:

```
df['A'] < 5
0 False
1 True
2 True
3 True
4 False
Name: A, dtype: bool
df['B'] > 5
0 False
1 True
2 False
3 True
4 True
Name: B, dtype: bool
```

```
(df['A'] < 5) & (df['B'] > 5)
0 False
1 True
2 False
3 True
4 False
dtype: bool
```

And the subsequent filtering step is simply,

```
df[(df['A'] < 5) & (df['B'] > 5)]
A B C
1 3 7 9
3 4 7 6
```

The parentheses are used to override the default precedence order of bitwise operators, which have higher precedence over the conditional operators `<`

and `>`

. See the section of Operator Precedence in the python docs.

If you do not use parentheses, the expression is evaluated incorrectly. For example, if you accidentally attempt something such as

```
df['A'] < 5 & df['B'] > 5
```

It is parsed as

```
df['A'] < (5 & df['B']) > 5
```

Which becomes,

```
df['A'] < something_you_dont_want > 5
```

Which becomes (see the python docs on chained operator comparison),

```
(df['A'] < something_you_dont_want) and (something_you_dont_want > 5)
```

Which becomes,

```
# Both operands are Series...
something_else_you_dont_want1
```**and** something_else_you_dont_want2

Which throws

```
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```

So, don’t make that mistake!^{1}

**Avoiding Parentheses Grouping**

The fix is actually quite simple. Most operators have a corresponding bound method for DataFrames. If the individual masks are built up using functions instead of conditional operators, you will no longer need to group by parens to specify evaluation order:

```
df['A'].lt(5)
0 True
1 True
2 True
3 True
4 False
Name: A, dtype: bool
df['B'].gt(5)
0 False
1 True
2 False
3 True
4 True
Name: B, dtype: bool
```

```
df['A'].lt(5) & df['B'].gt(5)
0 False
1 True
2 False
3 True
4 False
dtype: bool
```

See the section on Flexible Comparisons.. To summarise, we have

```
╒════╤════════════╤════════════╕
│ │ Operator │ Function │
╞════╪════════════╪════════════╡
│ 0 │ > │ gt │
├────┼────────────┼────────────┤
│ 1 │ >= │ ge │
├────┼────────────┼────────────┤
│ 2 │ < │ lt │
├────┼────────────┼────────────┤
│ 3 │ <= │ le │
├────┼────────────┼────────────┤
│ 4 │ == │ eq │
├────┼────────────┼────────────┤
│ 5 │ != │ ne │
╘════╧════════════╧════════════╛
```

Another option for avoiding parentheses is to use `DataFrame.query`

(or `eval`

):

```
df.query('A < 5 and B > 5')
A B C
1 3 7 9
3 4 7 6
```

I have *extensively* documented `query`

and `eval`

in Dynamic Expression Evaluation in pandas using pd.eval().

`operator.and_`

Allows you to perform this operation in a functional manner. Internally calls `Series.__and__`

which corresponds to the bitwise operator.

```
import operator
operator.and_(df['A'] < 5, df['B'] > 5)
# Same as,
# (df['A'] < 5).__and__(df['B'] > 5)
0 False
1 True
2 False
3 True
4 False
dtype: bool
df[operator.and_(df['A'] < 5, df['B'] > 5)]
A B C
1 3 7 9
3 4 7 6
```

You won’t usually need this, but it is useful to know.

**Generalizing: np.logical_and (and logical_and.reduce)**

Another alternative is using

`np.logical_and`

, which also does not need parentheses grouping:```
np.logical_and(df['A'] < 5, df['B'] > 5)
0 False
1 True
2 False
3 True
4 False
Name: A, dtype: bool
df[np.logical_and(df['A'] < 5, df['B'] > 5)]
A B C
1 3 7 9
3 4 7 6
```

`np.logical_and`

is a ufunc (Universal Functions), and most ufuncs have a `reduce`

method. This means it is easier to generalise with `logical_and`

if you have multiple masks to AND. For example, to AND masks `m1`

and `m2`

and `m3`

with `&`

, you would have to do

```
m1 & m2 & m3
```

However, an easier option is

```
np.logical_and.reduce([m1, m2, m3])
```

This is powerful, because it lets you build on top of this with more complex logic (for example, dynamically generating masks in a list comprehension and adding all of them):

```
import operator
cols = ['A', 'B']
ops = [np.less, np.greater]
values = [5, 5]
m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)])
m
# array([False, True, False, True, False])
df[m]
A B C
1 3 7 9
3 4 7 6
```

_{1 – I know I’m harping on this point, but please bear with me. This is a very, very common beginner’s mistake, and must be explained very thoroughly. }

**Logical OR**

For the `df`

above, say you’d like to return all rows where A == 3 or B == 7.

**Overloaded Bitwise |**

```
df['A'] == 3
0 False
1 True
2 True
3 False
4 False
Name: A, dtype: bool
df['B'] == 7
0 False
1 True
2 False
3 True
4 False
Name: B, dtype: bool
```

```
(df['A'] == 3) | (df['B'] == 7)
0 False
1 True
2 True
3 True
4 False
dtype: bool
df[(df['A'] == 3) | (df['B'] == 7)]
A B C
1 3 7 9
2 3 5 2
3 4 7 6
```

If you haven’t yet, please also read the section on **Logical AND** above, all caveats apply here.

Alternatively, this operation can be specified with

```
df[df['A'].eq(3) | df['B'].eq(7)]
A B C
1 3 7 9
2 3 5 2
3 4 7 6
```

`operator.or_`

Calls `Series.__or__`

under the hood.

```
operator.or_(df['A'] == 3, df['B'] == 7)
# Same as,
# (df['A'] == 3).__or__(df['B'] == 7)
0 False
1 True
2 True
3 True
4 False
dtype: bool
df[operator.or_(df['A'] == 3, df['B'] == 7)]
A B C
1 3 7 9
2 3 5 2
3 4 7 6
```

`np.logical_or`

For two conditions, use `logical_or`

:

```
np.logical_or(df['A'] == 3, df['B'] == 7)
0 False
1 True
2 True
3 True
4 False
Name: A, dtype: bool
df[np.logical_or(df['A'] == 3, df['B'] == 7)]
A B C
1 3 7 9
2 3 5 2
3 4 7 6
```

For multiple masks, use `logical_or.reduce`

:

```
np.logical_or.reduce([df['A'] == 3, df['B'] == 7])
# array([False, True, True, True, False])
df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])]
A B C
1 3 7 9
2 3 5 2
3 4 7 6
```

**Logical NOT**

Given a mask, such as

```
mask = pd.Series([True, True, False])
```

If you need to invert every boolean value (so that the end result is `[False, False, True]`

), then you can use any of the methods below.

**Bitwise ~**

```
~mask
0 False
1 False
2 True
dtype: bool
```

Again, expressions need to be parenthesised.

```
~(df['A'] == 3)
0 True
1 False
2 False
3 True
4 True
Name: A, dtype: bool
```

This internally calls

```
mask.__invert__()
0 False
1 False
2 True
dtype: bool
```

But don’t use it directly.

`operator.inv`

Internally calls `__invert__`

on the Series.

```
operator.inv(mask)
0 False
1 False
2 True
dtype: bool
```

`np.logical_not`

This is the numpy variant.

```
np.logical_not(mask)
0 False
1 False
2 True
dtype: bool
```

Note, `np.logical_and`

can be substituted for `np.bitwise_and`

, `logical_or`

with `bitwise_or`

, and `logical_not`

with `invert`

.

Logical operators for boolean indexing in Pandas

It’s important to realize that you cannot use any of the Python *logical operators* (`and`

, `or`

or `not`

) on `pandas.Series`

or `pandas.DataFrame`

s (similarly you cannot use them on `numpy.array`

s with more than one element). The reason why you cannot use those is because they implicitly call `bool`

on their operands which throws an Exception because these data structures decided that the boolean of an array is ambiguous:

```
>>> import numpy as np
>>> import pandas as pd
>>> arr = np.array([1,2,3])
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> bool(s)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> bool(df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```

I did cover this more extensively in my answer to the "Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" Q+A.

## NumPy’s logical functions

However NumPy provides element-wise operating equivalents to these operators as functions that can be used on `numpy.array`

, `pandas.Series`

, `pandas.DataFrame`

, or any other (conforming) `numpy.array`

subclass:

`and`

has`np.logical_and`

`or`

has`np.logical_or`

`not`

has`np.logical_not`

`numpy.logical_xor`

which has no Python equivalent, but it is a logical "exclusive or" operation

So, essentially, one should use (assuming `df1`

and `df2`

are Pandas DataFrames):

```
np.logical_and(df1, df2)
np.logical_or(df1, df2)
np.logical_not(df1)
np.logical_xor(df1, df2)
```

## Bitwise functions and bitwise operators for Booleans

However in case you have boolean NumPy array, Pandas Series, or Pandas DataFrames you could also use the element-wise bitwise functions (for booleans they are – or at least should be – indistinguishable from the logical functions):

- bitwise and:
`np.bitwise_and`

or the`&`

operator - bitwise or:
`np.bitwise_or`

or the`|`

operator - bitwise not:
`np.invert`

(or the alias`np.bitwise_not`

) or the`~`

operator - bitwise xor:
`np.bitwise_xor`

or the`^`

operator

Typically the operators are used. However when combined with comparison operators one has to remember to wrap the comparison in parenthesis because the bitwise operators have a higher precedence than the comparison operators:

```
(df1 < 10) | (df2 > 10) # instead of the wrong df1 < 10 | df2 > 10
```

This may be irritating because the Python logical operators have a lower precedence than the comparison operators, so you normally write `a < 10 and b > 10`

(where `a`

and `b`

are for example simple integers) and don’t need the parenthesis.

## Differences between logical and bitwise operations (on non-booleans)

It is really important to stress that bit and logical operations are only equivalent for Boolean NumPy arrays (and boolean Series & DataFrames). If these don’t contain Booleans then the operations will give different results. I’ll include examples using NumPy arrays, but the results will be similar for the pandas data structures:

```
>>> import numpy as np
>>> a1 = np.array([0, 0, 1, 1])
>>> a2 = np.array([0, 1, 0, 1])
>>> np.logical_and(a1, a2)
array([False, False, False, True])
>>> np.bitwise_and(a1, a2)
array([0, 0, 0, 1], dtype=int32)
```

And since NumPy (and similarly Pandas) does different things for Boolean (Boolean or “mask” index arrays) and integer (Index arrays) indices the results of indexing will be also be different:

```
>>> a3 = np.array([1, 2, 3, 4])
>>> a3[np.logical_and(a1, a2)]
array([4])
>>> a3[np.bitwise_and(a1, a2)]
array([1, 1, 1, 2])
```

## Summary table

```
Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator
-------------------------------------------------------------------------------------
and | np.logical_and | np.bitwise_and | &
-------------------------------------------------------------------------------------
or | np.logical_or | np.bitwise_or | |
-------------------------------------------------------------------------------------
| np.logical_xor | np.bitwise_xor | ^
-------------------------------------------------------------------------------------
not | np.logical_not | np.invert | ~
```

Where **the logical operator does not work for NumPy arrays**, Pandas Series, and pandas DataFrames. The others work on these data structures (and plain Python objects) and work element-wise.

However, be careful with the bitwise invert on plain Python `bool`

s because the bool will be interpreted as integers in this context (for example `~False`

returns `-1`

and `~True`

returns `-2`

).

Note that you can also use `*`

to do `and`

:

```
In [12]: np.all([a > 20, a < 40], axis=0)
Out[12]:
array([[False, True, False, False, True],
[False, False, False, False, False],
[ True, True, False, False, False],
[False, True, False, False, False],
[False, True, False, False, False]])
In [13]: (a > 20) * (a < 40)
Out[13]:
array([[False, True, False, False, True],
[False, False, False, False, False],
[ True, True, False, False, False],
[False, True, False, False, False],
[False, True, False, False, False]])
```

I’m not claiming this is better than using `np.all`

or `|`

. But it does work.