Combine pandas DataFrame query() method with isin()
Question:
So I want to use isin()
method with df.query()
, to select rows with id
in a list: id_list
. Similar question was asked before, but they used typical df[df['id'].isin(id_list)]
method. I’m wondering if there is a way to use df.query()
instead.
df = pd.DataFrame({'a': list('aabbccddeeff'), 'b': list('aaaabbbbcccc'),
'c': np.random.randint(5, size=12),
'd': np.random.randint(9, size=12)})
id_list = ["a", "b", "c"]
And this yields an error
df.query('a == id_list')
Answers:
This appears to work:
>>> df.query('a == {0}'.format(id_list))
a b c d
0 a a 4 1
1 a a 0 7
2 b a 2 1
3 b a 0 1
4 c b 4 0
5 c b 4 2
Whether or not it is more clear is a matter of personal taste.
From the docs for query
You can refer to variables
in the environment by prefixing them with an ‘@’ character like
@a + b
.
In your case:
In [38]: df.query('a == @id_list')
Out[38]:
a b c d
0 a a 3 4
1 a a 4 5
2 b a 2 3
3 b a 1 5
4 c b 2 4
5 c b 1 2
You can also include the list within the query string:
>>> df.query('a in ["a", "b", "c"]')
This is the same as:
>>> df.query('a in @id_list')
So I want to use isin()
method with df.query()
, to select rows with id
in a list: id_list
. Similar question was asked before, but they used typical df[df['id'].isin(id_list)]
method. I’m wondering if there is a way to use df.query()
instead.
df = pd.DataFrame({'a': list('aabbccddeeff'), 'b': list('aaaabbbbcccc'),
'c': np.random.randint(5, size=12),
'd': np.random.randint(9, size=12)})
id_list = ["a", "b", "c"]
And this yields an error
df.query('a == id_list')
This appears to work:
>>> df.query('a == {0}'.format(id_list))
a b c d
0 a a 4 1
1 a a 0 7
2 b a 2 1
3 b a 0 1
4 c b 4 0
5 c b 4 2
Whether or not it is more clear is a matter of personal taste.
From the docs for query
You can refer to variables
in the environment by prefixing them with an ‘@’ character like
@a + b
.
In your case:
In [38]: df.query('a == @id_list')
Out[38]:
a b c d
0 a a 3 4
1 a a 4 5
2 b a 2 3
3 b a 1 5
4 c b 2 4
5 c b 1 2
You can also include the list within the query string:
>>> df.query('a in ["a", "b", "c"]')
This is the same as:
>>> df.query('a in @id_list')