Multiconditional .isin() filter in pandas

Question:

Is it possible to filter a dataframe using multiples .isin() together?

df1
co | no | ds
A1 | yes| text1
A2 | no | text1
A3 | no | text3
df2
co | no | ds
A1 | yes| text1
A5 | no | text5
A6 | yes| text6

Filter if the entire row is in the other dataframe
df1 = df1[df1['co'].isin(df2['co'])] & df1[df1['no'].isin(df2['no'])] & df1[df1['ds'].isin(df2['ds'])]

df1
co | no | ds
A1 | yes| text1
Asked By: null92

||

Answers:

You can use an inner merge:

out = df1.merge(df2)

# or to limit to co/no/ds
cols = ['co', 'no', 'ds']
out = df1.merge(df2[cols])

Or, if you have potentially duplicates in df2, first drop_duplicates:

out = df1.merge(df2.drop_duplicates())

If you really want to use boolean indexing, then you can take advantage of numpy.logical_and.reduce to chain multiple conditions without needing to type all masks:

import numpy as np

cols = ['co', 'no', 'ds']
out = df1[np.logical_and.reduce([df1[c].isin(df2[c]) for c in cols])]
Answered By: mozway

Boolean arithmetics (&) on series are valid only on boolean series. Like what you give for each filtering.

So you just need to apply this & to the condition. Not to the indexation you get from these booleans series.

In other words

df1[df1['co'].isin(df2['co']) & df1['no'].isin(df2['no']) & df1['ds'].isin(df2['ds'])]

Almost the same line as your non-working one. Except that I apply & on the condition.
That is, instead of trying to df1[A] & df1[B] & df1[C] (which doesn’t mean much. You could try to intersect those two dataframes df1[A], df1[B] and df1[C]; that costs a bit; but not to apply a &between them) I dodf1[A & B & C]`.

(Note on mozway’s solution: that merge, indeed, is an elegant way to intersect the dataframes. But that is almost always slower than computing the boolean series index. Plus, in your case, your logic is already the one you need to compute such indexation. Timeit shows that merge solutions takes 3.59 ms per run, while this one takes 0.83 ms per run)

Answered By: chrslg

yes possible see the changes to your code

df1 = df1[(df1['co'].isin(df2['co']) & df1['no'].isin(df2['no']) &  df1['ds'].isin(df2['ds']))]
df1
Out[87]: 
   co   no     ds
0  A1  yes  text1
Answered By: Surjit Samra
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.