Check if word contains substrings

Question:

Question

Consider the following:

word = 'analphabetic'
df = pd.DataFrame({'substring': list('abcdefgh') + ['ab', 'phobic']})

substring is not necessarily a single letter!

I want to add a column with the name of word and each row it shows True/False whether the substring in that row is in word. Can I do this with a built-in pandas method?

Desired output:

  substring  analphabetic
0         a          True
1         b          True
2         c          True
3         d         False
4         e          True
5         f         False
6         g         False
7         h          True
8         ab         True
9         phobic    False

pandas.Series.str.contains

The other way around can be done by doing something like df.substring.str.contains(word). I guess you could do something like:

df[word] = [i in word for i in df.substring]

But then the built-in function str.contains() could be done by:

string = 'a'
df = pd.DataFrame({'words': ['these', 'are', 'some', 'random', 'words']})
df[string] = [string in i for i in df.words]

So my thought is that there is also a built-in method to do my trick.

Asked By: T C Molenaar

||

Answers:

A possible solution (which should work for substrings longer than a single letter):

df['analphabetic'] = df['substring'].map(lambda x: x in word)

Output:

  substring  analphabetic
0         a          True
1         b          True
2         c          True
3         d         False
4         e          True
5         f         False
6         g         False
7         h          True

Using list comprehension:

df['analphabetic'] = [x in word for x in df.substring]

Using apply:

df['analphabetic'] = df['substring'].apply(lambda x: x in word)
Answered By: PaulS

Yes you could use the contains to Find a Substring in a Pandas DataFrame.

You can also use the in Operator, the in operator is used to check data structures in Python. It also returns a Boolean (either True or False)

Answered By: Adeyemi Michael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.