Check if a substring is in a string python

Question:

I have two dataframe, I need to check contain substring from first df in each string in second df and get a list of words that are included in the second df

First df(word):

word
apples
dog
cat
cheese

Second df(sentence):

sentence
apples grow on a tree
I love cheese

I tried this one:

tru=[]
for i in word['word']:
    if i in sentence['sentence'].values:    
        tru.append(i)

And this one:

tru=[]
for i in word['word']:
    if sentence['sentence'].str.contains(i):    
        tru.append(i)

I expect to get a list like ['apples',..., 'cheese']

Answers:

One possible way is to use Series.str.extractall:

import re
import pandas as pd

df_word = pd.Series(["apples", "dog", "cat", "cheese"])
df_sentence = pd.Series(["apples grow on a tree", "i love cheese"])

pattern = f"({'|'.join(df_word.apply(re.escape))})"
matches = df_sentence.str.extractall(pattern)
matches

Output:


                0
    match   
0     0    apples
1     0    cheese

You can then convert the results to a list:

matches[0].unique().tolist()

Output:

['apples', 'cheese']
Answered By: Tim