Python Pandas: Boolean indexing on multiple columns

Question:

despite there being at least two good tutorials on how to index a DataFrame in Python’s pandas library, I still can’t work out an elegant way of SELECTing on more than one column.

>>> d = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[4, 5, 6, 7, 8]})
>>> d
   x  y
0  1  4
1  2  5
2  3  6
3  4  7
4  5  8
>>> d[d['x']>2] # This works fine
   x  y
2  3  6
3  4  7
4  5  8
>>> d[d['x']>2 & d['y']>7] # I had expected this to work, but it doesn't
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I have found (what I think is) a rather inelegant way of doing it, like this

>>> d[d['x']>2][d['y']>7]

But it’s not pretty, and it scores fairly low for readability (I think).

Is there a better, more Python-tastic way?

Asked By: LondonRob

||

Answers:

There may still be a better way, but

In [56]: d[d['x'] > 2] and d[d['y'] > 7]
Out[56]: 
   x  y
4  5  8

works.

Answered By: TomAugspurger

It is a precedence operator issue.

You should add extra parenthesis to make your multi condition test working:

d[(d['x']>2) & (d['y']>7)]

This section of the tutorial you mentioned shows an example with several boolean conditions and the parenthesis are used.

Answered By: Zeugma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.