Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Question:
I want to filter my dataframe with an or
condition to keep rows with a particular column’s values that are outside the range [0.25, 0.25]
. I tried:
df = df[(df['col'] < 0.25) or (df['col'] > 0.25)]
But I get the error:
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Answers:
The or
and and
Python statements require truthvalues. For pandas, these are considered ambiguous, so you should use "bitwise" 
(or) or &
(and) operations:
df = df[(df['col'] < 0.25)  (df['col'] > 0.25)]
These are overloaded for these kinds of data structures to yield the elementwise or
or and
.
Just to add some more explanation to this statement:
The exception is thrown when you want to get the bool
of a pandas.Series
:
>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You hit a place where the operator implicitly converted the operands to bool
(you used or
but it also happens for and
, if
and while
):
>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Besides these four statements, there are several Python functions that hide some bool
calls (like any
, all
, filter
, …). These are normally not problematic with pandas.Series
, but for completeness I wanted to mention these.
In your case, the exception isn’t really helpful, because it doesn’t mention the right alternatives. For and
and or
, if you want elementwise comparisons, you can use:

>>> import numpy as np >>> np.logical_or(x, y)
or simply the

operator:>>> x  y

>>> np.logical_and(x, y)
or simply the
&
operator:>>> x & y
If you’re using the operators, then be sure to set your parentheses correctly because of operator precedence.
There are several logical NumPy functions which should work on pandas.Series
.
The alternatives mentioned in the Exception are more suited if you encountered it when doing if
or while
. I’ll shortly explain each of these:

If you want to check if your Series is empty:
>>> x = pd.Series([]) >>> x.empty True >>> x = pd.Series([1]) >>> x.empty False
Python normally interprets the
len
gth of containers (likelist
,tuple
, …) as truthvalue if it has no explicit Boolean interpretation. So if you want the Pythonlike check, you could do:if x.size
orif not x.empty
instead ofif x
. 
If your
Series
contains one and only one Boolean value:>>> x = pd.Series([100]) >>> (x > 50).bool() True >>> (x < 50).bool() False

If you want to check the first and only item of your Series (like
.bool()
, but it works even for nonBoolean contents):>>> x = pd.Series([100]) >>> x.item() 100

If you want to check if all or any item is notzero, notempty or notFalse:
>>> x = pd.Series([0, 1, 2]) >>> x.all() # Because one element is zero False >>> x.any() # because one (or more) elements are nonzero True
For Boolean logic, use &
and 
.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
>>> df
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 0.977278
2 0.950088 0.151357 0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
>>> df.loc[(df.C > 0.25)  (df.C < 0.25)]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
To see what is happening, you get a column of Booleans for each comparison, e.g.,
df.C > 0.25
0 True
1 False
2 False
3 True
4 True
Name: C, dtype: bool
When you have multiple criteria, you will get multiple columns returned. This is why the join logic is ambiguous. Using and
or or
treats each column separately, so you first need to reduce that column to a single Boolean value. For example, to see if any value or all values in each of the columns is True.
# Any value in either column is True?
(df.C > 0.25).any() or (df.C < 0.25).any()
True
# All values in either column is True?
(df.C > 0.25).all() or (df.C < 0.25).all()
False
One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic.
>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < 0.25)]]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
For more details, refer to Boolean Indexing in the documentation.
Or, alternatively, you could use the operator module. More detailed information is in the Python documentation:
import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < 0.25)]
A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.4438
This excellent answer explains very well what is happening and provides a solution. I would like to add another solution that might be suitable in similar cases: using the query
method:
df = df.query("(col > 0.25) or (col < 0.25)")
See also Indexing and selecting data.
(Some tests with a dataframe I’m currently working with suggest that this method is a bit slower than using the bitwise operators on series of Booleans: 2 ms vs. 870 µs)
A piece of warning: At least one situation where this is not straightforward is when column names happen to be Python expressions. I had columns named WT_38hph_IP_2
, WT_38hph_input_2
and log2(WT_38hph_IP_2/WT_38hph_input_2)
and wanted to perform the following query: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"
I obtained the following exception cascade:
KeyError: 'log2'
UndefinedVariableError: name 'log2' is not defined
ValueError: "log2" is not a supported function
I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column.
A possible workaround is proposed here.
Pandas uses bitwise &

. Also, each condition should be wrapped inside ( )
.
This works:
data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]
But the same query without parentheses does not:
data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]
I encountered the same error and got stalled with a PySpark dataframe for few days. I was able to resolve it successfully by filling na values with 0 since I was comparing integer values from two fields.
You need to use bitwise operators 
instead of or
and &
instead of and
in pandas. You can’t simply use the bool statements from python.
For much complex filtering, create a mask
and apply the mask on the dataframe.
Put all your query in the mask and apply it. Suppose,
mask = (df["col1"]>=df["col2"]) & (stock["col1"]<=df["col2"])
df_new = df[mask]
One minor thing, which wasted my time.
Put the conditions (if comparing using " = ", " != ") in parentheses. Failing to do so also raises this exception.
This will work:
df[(some condition) conditional operator (some conditions)]
This will not:
df[some condition conditionaloperator some condition]
I’ll try to give the benchmark of the three most common way (also mentioned above):
from timeit import repeat
setup = """
import numpy as np;
import random;
x = np.linspace(0,100);
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
"""
stmts = 'x[(x > lb) * (x <= ub)]', 'x[(x > lb) & (x <= ub)]', 'x[np.logical_and(x > lb, x <= ub)]'
for _ in range(3):
for stmt in stmts:
t = min(repeat(stmt, setup, number=100_000))
print('%.4f' % t, stmt)
print()
Result:
0.4808 x[(x > lb) * (x <= ub)]
0.4726 x[(x > lb) & (x <= ub)]
0.4904 x[np.logical_and(x > lb, x <= ub)]
0.4725 x[(x > lb) * (x <= ub)]
0.4806 x[(x > lb) & (x <= ub)]
0.5002 x[np.logical_and(x > lb, x <= ub)]
0.4781 x[(x > lb) * (x <= ub)]
0.4336 x[(x > lb) & (x <= ub)]
0.4974 x[np.logical_and(x > lb, x <= ub)]
But, *
is not supported in Panda Series, and NumPy Array is faster than pandas data frame (around 1000 times slower, see number):
from timeit import repeat
setup = """
import numpy as np;
import random;
import pandas as pd;
x = pd.DataFrame(np.linspace(0,100));
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
"""
stmts = 'x[(x > lb) & (x <= ub)]', 'x[np.logical_and(x > lb, x <= ub)]'
for _ in range(3):
for stmt in stmts:
t = min(repeat(stmt, setup, number=100))
print('%.4f' % t, stmt)
print()
Result:
0.1964 x[(x > lb) & (x <= ub)]
0.1992 x[np.logical_and(x > lb, x <= ub)]
0.2018 x[(x > lb) & (x <= ub)]
0.1838 x[np.logical_and(x > lb, x <= ub)]
0.1871 x[(x > lb) & (x <= ub)]
0.1883 x[np.logical_and(x > lb, x <= ub)]
Note: adding one line of code x = x.to_numpy()
will need about 20 µs.
For those who prefer %timeit
:
import numpy as np
import random
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
lb, ub
x = pd.DataFrame(np.linspace(0,100))
def asterik(x):
x = x.to_numpy()
return x[(x > lb) * (x <= ub)]
def and_symbol(x):
x = x.to_numpy()
return x[(x > lb) & (x <= ub)]
def numpy_logical(x):
x = x.to_numpy()
return x[np.logical_and(x > lb, x <= ub)]
for i in range(3):
%timeit asterik(x)
%timeit and_symbol(x)
%timeit numpy_logical(x)
print('n')
Result:
23 µs ± 3.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
35.6 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
31.3 µs ± 8.9 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21.4 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
21.9 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21.7 µs ± 500 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
25.1 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
36.8 µs ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
28.2 µs ± 5.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I was getting an error in this command:
if df != '':
pass
But it worked when I changed it to this:
if df is not '':
pass
If you have more than one value:
df['col'].all()
If it’s only a single value:
df['col'].item()
This is quite a common question for beginners when making multiple conditions in Pandas. Generally speaking, there are two possible conditions causing this error:
Condition 1: Python Operator Precedence
There is a paragraph of Boolean indexing  Indexing and selecting data — pandas documentation explains this:
Another common operation is the use of boolean vectors to filter the data. The operators are:

foror
,&
forand
, and~
fornot
. These must be grouped by using parentheses.By default Python will evaluate an expression such as
df['A'] > 2 & df['B'] < 3
asdf['A'] > (2 & df['B']) < 3
, while the desired evaluation order is(df['A'] > 2) & (df['B'] < 3)
.
# Wrong
df['col'] < 0.25  df['col'] > 0.25
# Right
(df['col'] < 0.25)  (df['col'] > 0.25)
There are some possible ways to get rid off the parentheses, and I will cover this later.
Condition 2: Improper Operator/Statement
As is explained in the previous quotation, you need use 
for or
, &
for and
, and ~
for not
.
# Wrong
(df['col'] < 0.25) or (df['col'] > 0.25)
# Right
(df['col'] < 0.25)  (df['col'] > 0.25)
Another possible situation is that you are using a Boolean Series in an if
statement.
# Wrong
if pd.Series([True, False]):
pass
It’s clear that the Python if
statement accepts Booleanlike expression rather than Pandas Series. You should use pandas.Series.any
or methods listed in the error message to convert the Series to a value according to your need.
For example:
# Right
if df['col'].eq(0).all():
# If you want all column values equal to zero
print('do something')
# Right
if df['col'].eq(0).any():
# If you want at least one column value equal to zero
print('do something')
Let’s talk about ways to escape the parentheses in the first situation.

Use Pandas mathematical functions
Pandas has defined a lot of mathematical functions, including comparison, as follows:
pandas.Series.lt()
for less than;pandas.Series.gt()
for greater than;pandas.Series.le()
for less and equal;pandas.Series.ge()
for greater and equal;pandas.Series.ne()
for not equal;pandas.Series.eq()
for equal;
As a result, you can use
df = df[(df['col'] < 0.25)  (df['col'] > 0.25)] # is equal to df = df[df['col'].lt(0.25)  df['col'].gt(0.25)]

If you want to select rows in between two values, you can use
pandas.Series.between
:df['col].between(left, right)
is equal to
(left <= df['col']) & (df['col'] <= right)
;df['col].between(left, right, inclusive='left)
is equal to
(left <= df['col']) & (df['col'] < right)
;df['col].between(left, right, inclusive='right')
is equal to
(left < df['col']) & (df['col'] <= right)
;df['col].between(left, right, inclusive='neither')
is equal to
(left < df['col']) & (df['col'] < right)
;
df = df[(df['col'] > 0.25) & (df['col'] < 0.25)] # is equal to df = df[df['col'].between(0.25, 0.25, inclusive='neither')]

Document referenced before has a chapter The
query()
Method explains this well.pandas.DataFrame.query()
can help you select a DataFrame with a condition string. Within the query string, you can use both bitwise operators (&
and
) and their boolean cousins (and
andor
). Moreover, you can omit the parentheses, but I don’t recommend it for readability reasons.df = df[(df['col'] < 0.25)  (df['col'] > 0.25)] # is equal to df = df.query('col < 0.25 or col > 0.25')

pandas.DataFrame.eval()
evaluates a string describing operations on DataFrame columns. Thus, we can use this method to build our multiple conditions. The syntax is the same withpandas.DataFrame.query()
.df = df[(df['col'] < 0.25)  (df['col'] > 0.25)] # is equal to df = df[df.eval('col < 0.25 or col > 0.25')]
pandas.DataFrame.query()
andpandas.DataFrame.eval()
can do more things than I describe here. You are recommended to read their documentation and have fun with them.
I have faced the same issue while working in the Panda dataframe.
I have used: numpy.logical_and:
Here I am trying to select the row with Id matched with 41d7853
and degreee_type not with Certification
.
Like below:
display(df_degrees.loc[np.logical_and(df_degrees['person_id'] == '41d7853' , df_degrees['degree_type'] !='Certification')])
If I try to write code like the below:
display(df_degrees.loc[df_degrees['person_id'] == '41d7853' and df_degrees['degree_type'] !='Certification'])
We will get the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have used numpy.logical_and it worked for me.
In order to check a truth value I use either of these solutions.
The first approach is instead of checking a potential df as if df…
import pandas as pd
something = pd.DataFrame()
somethingSeries = pd.Series(object)
if isinstance(something, pd.DataFrame):
print("this is a Pandas DataFrame")
if isinstance(something, pd.Series):
print("this is a Pandas Series")
The second approach is to extend pd.DataFrame class as follows (with magic method bool):
import pandas as pd
class MyDataFrame(pd.DataFrame):
def __init__(self, *args, **kw):
pd.DataFrame.__init__(self, *args, **kw)
def __bool__(self):
return True
Using these approaches, we may have the ability to check if the variable is really a DataFrame.