Combine pandas DataFrame query() method with isin()

Question:

So I want to use isin() method with df.query(), to select rows with id in a list: id_list. Similar question was asked before, but they used typical df[df['id'].isin(id_list)] method. I’m wondering if there is a way to use df.query() instead.

df = pd.DataFrame({'a': list('aabbccddeeff'), 'b': list('aaaabbbbcccc'),
                   'c': np.random.randint(5, size=12),
                   'd': np.random.randint(9, size=12)})

id_list = ["a", "b", "c"]

And this yields an error

df.query('a == id_list')
Asked By: chen

||

Answers:

This appears to work:

>>> df.query('a == {0}'.format(id_list))
   a  b  c  d
0  a  a  4  1
1  a  a  0  7
2  b  a  2  1
3  b  a  0  1
4  c  b  4  0
5  c  b  4  2

Whether or not it is more clear is a matter of personal taste.

Answered By: Alexander

From the docs for query

You can refer to variables
in the environment by prefixing them with an ‘@’ character like
@a + b.

In your case:

In [38]: df.query('a == @id_list')
Out[38]:
   a  b  c  d
0  a  a  3  4
1  a  a  4  5
2  b  a  2  3
3  b  a  1  5
4  c  b  2  4
5  c  b  1  2
Answered By: maxymoo

You can also include the list within the query string:

>>> df.query('a in ["a", "b", "c"]')

This is the same as:

>>> df.query('a in @id_list')
Answered By: Seiji Armstrong

You can also call isin inside query:

df.query('a.isin(@id_list).values')

# or alternatively
df.query('a.isin(["a", "b", "c"]).values')
Answered By: rachwa
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.