I know how to create a mask to filter a dataframe when querying a single column:
import pandas as pd import datetime index = pd.date_range('2013-1-1',periods=100,freq='30Min') data = pd.DataFrame(data=list(range(100)), columns=['value'], index=index) data['value2'] = 'A' data['value2'].loc[0:10] = 'B' data value value2 2013-01-01 00:00:00 0 B 2013-01-01 00:30:00 1 B 2013-01-01 01:00:00 2 B 2013-01-01 01:30:00 3 B 2013-01-01 02:00:00 4 B 2013-01-01 02:30:00 5 B 2013-01-01 03:00:00 6 B
I use a simple mask here:
mask = data['value'] > 4 data[mask] value value2 2013-01-01 02:30:00 5 B 2013-01-01 03:00:00 6 B 2013-01-01 03:30:00 7 B 2013-01-01 04:00:00 8 B 2013-01-01 04:30:00 9 B 2013-01-01 05:00:00 10 A
My question is how to create a mask with multiple columns? So if I do this:
data[data['value2'] == 'A' ][data['value'] > 4]
This filters as I would expect but how do I create a bool mask from this as per my other example? I have provided the test data for this but I often want to create a mask on other types of data so Im looking for any pointers please.
Your boolean masks are boolean (obviously) so you can use boolean operations on them. The boolean operators include (but are not limited to)
| which can combine your masks based on either an ‘and’ operation or an ‘or’ operation. In your specific case, you need an ‘and’ operation. So you simply write your mask like so:
mask = (data['value2'] == 'A') & (data['value'] > 4)
This ensures you are selecting those rows for which both conditions are simultaneously satisfied. By replacing the
|, one can select those rows for which either of the two conditions can be satisfied. You can select your result as usual:
Although this question is answered by the answer to the question that ayhan points out in his comment, I thought that the OP was lacking the idea of boolean operations.