Pandas pull rows where any column contains certain strings

Question:

I’m trying to return rows where any of my columns contain any of the words in a word list. Let’s say word_list = ['Synthetic', 'Advanced or Advantage/Excellence']. I’ve tried the following code df[df.apply(' '.join, 1).str.contains('|'.join(word_list))].

The problem is some of my columns contain null values, so after running that code I got the error TypeError: sequence item 0: expected str instance, int found (maybe Pandas treats the null values as "int" type?)

Is there anyway I can construct my code in a way that Pandas can either ignore the null values, or treat the null values as string, so that my function can work?

Asked By: Stanleyrr

||

Answers:

The problem is that you are trying to concatenate an int and a str, you can try this instead:

df[df.apply(lambda x: x.astype(str).str.contains('|'.join(word_list), case=False).any(), axis=1)]

I’ve tried this with int/float/NaNs in the columns and worked ok for me.

Answered By: Pedro Rocha