Pandas, filter rows which column contain another column

Question

How can I filter rows which column contain another column?
For example, if we have DT with two columns A, B, can we filter rows with B.contains(A)? Not just if B contains some A values from all A from DT, but just in one row.

A      B
'lol'  'lolec'
'ram'  'rambo'
'ki'   'pio'

Result:
A     B
'lol'  'lolec'
'ram'  'rambo'

Asked By: wowbrowser search

||

Source

Answer 1

You can use boolean indexing with mask created by apply and in if need filter columns A and B per rows:

#if necessary strip ' in all values
df = df.apply(lambda x: x.str.strip("'"))
#df = df.applymap(lambda x: x.strip("'"))

print (df.apply(lambda x: x.A in x.B, axis=1))
0     True
1     True
2    False
dtype: bool

df = df[df.apply(lambda x: x.A in x.B, axis=1)]
print (df)
     A      B
0  lol  lolec
1  ram  rambo

Difference of solutions – input DataFrame is changed:

print (df)
     A      B
0  lol    pio
1  ram  rambo
2   ki  lolec

print (df[df.apply(lambda x: x.A in x.B, axis=1)])
     A      B
1  ram  rambo

print (df[df['B'].str.contains("|".join(df['A']))])
    A      B
1  ram  rambo
2   ki  lolec

for improve performance use list comprehension:

df = df[[a in b for a, b in zip(df.A, df.B)]]

Answered By: jezrael

Answer 2

You can use str.contains to match each of the substrings by using the regex | character which implies an OR selection from the contents of the other series:

df[df['B'].str.contains("|".join(df['A']))]

Answered By: Nickil Maveli

Pandas, filter rows which column contain another column

Question:

Answers: