'and' operator in string.contains

Question:

I have a pandas series in which I am applying string search this way

df['column_name'].str.contains('test1')

This gives me true/false list depending on string ‘test1’ is contained in column ‘column_name’ or not.

However I am not able to test two strings where I need to check if both strings are there or not. Something like

  df['column_name'].str.contains('test1' and 'test2')

This does not seem to work. Any suggestions would be great.

Asked By: PagMax

||

Answers:

all( word in df['column_name'] for word in ['test1', 'test2'] )

this will test an arbitrary number or words present in a string

Answered By: user2255757

No you have to create 2 conditions and use & and wrap parentheses around the conditions due to operator precedence:

(df['column_name'].str.contains('test1')) & (df['column_name'].str.contains('test2))

If you wanted to test for either word then the following would work:

df['column_name'].str.contains('test1|test2')
Answered By: EdChum

Ignoring the missing quote from 'test2, the ‘and’ operator is a boolean logical operator. It does not concatenate strings and it does not perform the action that you are thinking it does.

>>> 'test1' and 'test2'
'test2'
>>> 'test1' or 'test2'
'test1'
>>> 10 and 20
20
>>> 10 and 0
10
>>> 0 or 20
20
>>> # => and so on...

This occurs because the and and or operators function as ‘truth deciders’ and have mildly strange behavior with strings. In essence, the return value is the last value to have been evaluated, whether it’s a string or otherwise. Look at this behavior:

>>> a = 'test1'
>>> b = 'test2'
>>> c = a and b
>>> c is a
False
>>> c is b
True

The latter value is assigned to the variable to which we are giving it. What you’re looking for is a way to iterate over a list or set of strings and ensure that all of them result in true. We use the all(iterable) function for this.

if all([df['column_name'].contains(_) for _ in ['test1', 'test2']]):
    print("All strings are contained in it.")
else:
    print("Not all strings are contained in it.")

Assuming the case is true, the following is an example of what you’d receive:

>>> x = [_ in df['column_name'] for _ in ['test1', 'test2']
>>> print(x)
[True, True] # => returns True for all()
>>> all(x)
True
>>> x[0] = 'ThisIsNotIntTheColumn' in df['column_name']
>>> print(x)
[False, True]
>>> all(x)
False
Answered By: Goodies

You want to know if test1 AND test2 are somewhere in the column.

So df['col_name'].str.contains('test1').any() & df['col_name'].str.contains('test2').any().

Answered By: B. M.

Just use reduce() method if you need to apply a list of strings as a filter

from functools import reduce
import pandas as pd

df = pd.DataFrame({
    'column_name': [1,'test1_sdv_test2_vsd',3,4,5, 'test2test1'],
    'column_name_2': [3,6,3,2,7,8]
})

items = ['test1', 'test2'] # list of strings you want to apply as filter


def filter_series_by_list(s, items): 
    return reduce(lambda a, b: a & b, (s.str.contains(item, na=False) for item in items))


print(filter_series_by_list(df['column_name'], items))


RESULT:
0    False
1    True
2    False
3    False
4    False
5    True
Name: column_name, dtype: bool
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.