Filtering a column on pandas based on a string

Question:

I’m trying to filter a column on pandas based on a string, but the issue that I’m facing is that the rows are lists and not only strings.

A small example of the column

tags
['get_mail_mail']
['app', 'oflline_hub', 'smart_home']
['get_mail_mail', 'smart_home']
['web']
[]
[]
['get_mail_mail']

and I’m using this

df[df["tags"].str.contains("smart_home", case=False, na=False)]

but it’s returning an empty dataframe.

Asked By: Tayzer Damasceno

||

Answers:

You can explode, then compare and aggregate with groupby.any:

m = (df['tags'].explode()
     .str.contains('smart_home', case=False, na=False)
     .groupby(level=0).any()
    )

out = df[m]

Or concatenate the string with a delimiter and use str.contains:

out = df[df['tags'].agg('|'.join).str.contains('smart_home')]

Or use a list comprehension:

out = df[[any(s=='smart_home' for s in l) for l in df['tags']]]

output:

                             tags
1  [app, oflline_hub, smart_home]
2     [get_mail_mail, smart_home]
Answered By: mozway

You could try:

# define list of searching patterns
pattern = ["smart_home"]

df.loc[(df.apply(lambda x: any(m in str(v) 
                               for v in x.values 
                               for m in pattern), 
       axis=1))]

Output

    tags
--  ------------------------------------
 1  ['app', 'oflline_hub', 'smart_home']
 2  ['get_mail_mail', 'smart_home']
Answered By: Marco_CH
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.