Pandas extract phrases in string that occur in a list

Question:

I have a data frame with a column text which has strings as shown below

text
my name is abc
xyz is a fruit
abc likes per

I also have a list of phrases as shown below

['abc', 'fruit', 'likes per']

I want to add a column terms to my data frame which contains those phrases in the list that occur in the text string, so result in this case would be

text                terms
my name is abc      ['abc']
xyz is a fruit      ['fruit']
abc likes per       ['abc', 'likes per']

Can I do this without using regex?

Asked By: S_S

||

Answers:

Use Series.str.findall with regex word boundaries bb:

df = pd.DataFrame({"text": ["my name is abcd", "xyz is a fruit", "abc likes per"]})

L = ['abc', 'fruit', 'likes per']

pat = '|'.join(r"b{}b".format(x) for x in L)
df['terms'] = df['text'].str.findall(pat)

If word boundaries are not important:

df['terms1'] = df['text'].str.findall('|'.join(L))
print (df)
              text             terms            terms1
0  my name is abcd                []             [abc]
1   xyz is a fruit           [fruit]           [fruit]
2    abc likes per  [abc, likes per]  [abc, likes per]
Answered By: jezrael

I hope, this works for your solution use apply to check the condition if its present in the list.

import pandas as pd
df = pd.DataFrame(data={
    "text": ["my name is abc", "xyz is a fruit", "abc likes per"]
})
lst = ['abc', 'fruit', 'likes per']
df['terms'] = df['text'].apply(lambda x: [i for i in lst if i in x])
df
Answered By: Muhammad Ali
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.