How to filter values in columns of df that contains all substring in passed list in Pandas DataFrame Python?

Question

Any suggestion how to stay only values that contains all substring that in list for any column?:

import pandas as pd

df = pd.DataFrame(
    [
    [1, 'foollish', 'molish'], 
    [2, 'barnylishon', 'chacha'], 
    [3, 'bazon', 'gazon'],
    ], 
    columns=['id', 'value_1', 'value_2'])

print (df)
    
search_list = ['a','on']

print ("Desire result for value_1 column:")

df_desire_result = pd.DataFrame(
    [
    [1, 'barnylishon', 'chacha'], 
    [2, 'bazon', 'gazon'], 
    ], 
    columns=['id', 'value_1', 'value_2'])

print (df_desire_result)

Asked By: Константин Прудников

||

Source

Answer 1

From this statement that contains all substring that in list for any column? : I figure that if any column in a row has all the substrings in the search_list then retain that row and drop remaining rows.

Then IIUC:

cols = df.columns.drop('id').tolist()
m = df[cols].apply(lambda x: all([any(x.str.contains(s)) for s in search_list]), axis=1)
out = df[m]

print(out):

   id      value_1 value_2
1   2  barnylishon  chacha
2   3        bazon   gazon

Answered By: SomeDude

Answer 2

You can use:

# craft regex pattern
import re
pattern = '|'.join(map(re.escape, search_list))
# 'a|on'

out = df.loc[(df
   # extract words from all cells
   .filter(like='value')
   .stack()
   .str.extractall(fr'({pattern})')[0]
   # ensure that each word is present at least once per row
   .groupby(level=0).nunique()
   .eq(len(search_list))
   .reindex(df.index, fill_value=False)
 )]

print(out)

Output:

   id      value_1 value_2
1   2  barnylishon  chacha
2   3        bazon   gazon

Answered By: mozway

How to filter values in columns of df that contains all substring in passed list in Pandas DataFrame Python?

Question:

Answers: