# Use a list of values to select rows from a Pandas dataframe

## Question:

Let’s say I have the following Pandas dataframe:

```
df = DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})
df
A B
0 5 1
1 6 2
2 3 3
3 4 5
```

I can subset based on a specific value:

```
x = df[df['A'] == 3]
x
A B
2 3 3
```

But how can I subset based on a list of values? – something like this:

```
list_of_values = [3, 6]
y = df[df['A'] in list_of_values]
```

To get:

```
A B
1 6 2
2 3 3
```

## Answers:

You can use the `isin`

method:

```
In [1]: df = pd.DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})
In [2]: df
Out[2]:
A B
0 5 1
1 6 2
2 3 3
3 4 5
In [3]: df[df['A'].isin([3, 6])]
Out[3]:
A B
1 6 2
2 3 3
```

And to get the opposite use `~`

:

```
In [4]: df[~df['A'].isin([3, 6])]
Out[4]:
A B
0 5 1
3 4 5
```

You can use the method query:

```
df.query('A in [6, 3]')
# df.query('A == [6, 3]')
```

or

```
lst = [6, 3]
df.query('A in @lst')
# df.query('A == @lst')
```

Another method;

```
df.loc[df.apply(lambda x: x.A in [3,6], axis=1)]
```

Unlike the isin method, this is particularly useful in determining if the list contains a function of the column `A`

. For example, `f(A) = 2*A - 5`

as the function;

```
df.loc[df.apply(lambda x: 2*x.A-5 in [3,6], axis=1)]
```

It should be noted that this approach is slower than the `isin`

method.

You can store your values in a list as:

`lis = [3,6]`

then

`df1 = df[df['A'].isin(lis)]`

`list_of_values`

doesn’t have to be a `list`

; it can be `set`

, `tuple`

, `dictionary`

, numpy array, pandas Series, generator, `range`

etc. and `isin()`

and `query()`

will still work.

A note on `query()`

:

- You can also call
`isin()`

inside`query()`

:`list_of_values = [3, 6] df.query("A.isin(@list_of_values)")`

- You can pass a values to search over as a
`local_dict`

argument, which is useful if you don’t want to create the filtering list beforehand in a chain of function calls:`df.query("A == @lst", local_dict={'lst': [3, 6]})`

## Some common problems with selecting rows

#### 1. `list_of_values`

is a range

If you need to filter within a range, you can use `between()`

method or `query()`

.

```
list_of_values = [3, 4, 5, 6] # a range of values
df[df['A'].between(3, 6)] # or
df.query('3<=A<=6')
```

#### 2. Return `df`

in the order of `list_of_values`

In the OP, the values in `list_of_values`

don’t appear in that order in `df`

. If you want `df`

to return in the order they appear in `list_of_values`

, i.e. "sort" by `list_of_values`

, use `loc`

.

```
list_of_values = [3, 6]
df.set_index('A').loc[list_of_values].reset_index()
```

If you want to retain the old index, you can use the following.

```
list_of_values = [3, 6, 3]
df.reset_index().set_index('A').loc[list_of_values].reset_index().set_index('index').rename_axis(None)
```

#### 3. Don’t use `apply`

In general, `isin()`

and `query()`

are the best methods for this task; there’s no need for `apply()`

. For example, for function `f(A) = 2*A - 5`

on column `A`

, both `isin()`

and `query()`

work much more efficiently:

```
df[(2*df['A']-5).isin(list_of_values)] # or
df[df['A'].mul(2).sub(5).isin(list_of_values)] # or
df.query("A.mul(2).sub(5) in @list_of_values")
```

#### 4. Select rows not in `list_of_values`

To select rows not in `list_of_values`

, negate `isin()`

/`in`

:

```
df[~df['A'].isin(list_of_values)]
df.query("A not in @list_of_values") # df.query("A != @list_of_values")
```

#### 5. Select rows where multiple columns are in `list_of_values`

If you want to filter using both (or multiple) columns, there’s `any()`

and `all()`

to reduce columns (`axis=1`

) depending on the need.

- Select rows where at least one of
`A`

or`B`

is in`list_of_values`

:`df[df[['A','B']].isin(list_of_values).any(1)] df.query("A in @list_of_values or B in @list_of_values")`

- Select rows where both of
`A`

and`B`

are in`list_of_values`

:`df[df[['A','B']].isin(list_of_values).all(1)] df.query("A in @list_of_values and B in @list_of_values")`

Its trickier with f-Strings

```
list_of_values = [3,6]
df.query(f'A in {list_of_values}')
```

The above answers are correct, but if you still are not able to filter out rows as expected, make sure both DataFrames’ columns have the same `dtype`

.

```
source = source.astype({1: 'int64'})
to_rem = to_rem.astype({'some col': 'int64'})
works = source[~source[1].isin(to_rem['some col'])]
```

Took me long enough.

A non pandas solution that compares in terms of speed may be:

```
filtered_column = set(df.A) - set(list_list_of_values)
```