Python Pandas: Boolean indexing on multiple columns
Question:
despite there being at least two good tutorials on how to index a DataFrame in Python’s pandas
library, I still can’t work out an elegant way of SELECT
ing on more than one column.
>>> d = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[4, 5, 6, 7, 8]})
>>> d
x y
0 1 4
1 2 5
2 3 6
3 4 7
4 5 8
>>> d[d['x']>2] # This works fine
x y
2 3 6
3 4 7
4 5 8
>>> d[d['x']>2 & d['y']>7] # I had expected this to work, but it doesn't
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I have found (what I think is) a rather inelegant way of doing it, like this
>>> d[d['x']>2][d['y']>7]
But it’s not pretty, and it scores fairly low for readability (I think).
Is there a better, more Python-tastic way?
Answers:
There may still be a better way, but
In [56]: d[d['x'] > 2] and d[d['y'] > 7]
Out[56]:
x y
4 5 8
works.
It is a precedence operator issue.
You should add extra parenthesis to make your multi condition test working:
d[(d['x']>2) & (d['y']>7)]
This section of the tutorial you mentioned shows an example with several boolean conditions and the parenthesis are used.
despite there being at least two good tutorials on how to index a DataFrame in Python’s pandas
library, I still can’t work out an elegant way of SELECT
ing on more than one column.
>>> d = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[4, 5, 6, 7, 8]})
>>> d
x y
0 1 4
1 2 5
2 3 6
3 4 7
4 5 8
>>> d[d['x']>2] # This works fine
x y
2 3 6
3 4 7
4 5 8
>>> d[d['x']>2 & d['y']>7] # I had expected this to work, but it doesn't
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I have found (what I think is) a rather inelegant way of doing it, like this
>>> d[d['x']>2][d['y']>7]
But it’s not pretty, and it scores fairly low for readability (I think).
Is there a better, more Python-tastic way?
There may still be a better way, but
In [56]: d[d['x'] > 2] and d[d['y'] > 7]
Out[56]:
x y
4 5 8
works.
It is a precedence operator issue.
You should add extra parenthesis to make your multi condition test working:
d[(d['x']>2) & (d['y']>7)]
This section of the tutorial you mentioned shows an example with several boolean conditions and the parenthesis are used.