mark one dataframe if pattern found in another dataframe

Question

I’ve 2 dataframes and want to mark 2nd dataframes if pattern exists in first.
A: Very large of rows (>10000’s)

date      | items 
20100605  | apple is red 
20110606  | orange is orange 
20120607  | apple is green

B: shorter with few hundred rows.

id   |  color
123  |  is Red
234  |  not orange
235  |  is green

Result would be to flag all columns in B if pattern found in A, possibly adding a column to B like.

B:
id   |  color       | found
123  |  is Red      | true
234  |  not orange  | false
235  |  is green    | true

thinking of something like, dfB[‘found’] = dfB[‘color’].isin(dfA[‘items’]) but dont see any way to ignore case. Also, with this approach it will change true to false. Dont want to change those which are already set true. Also, i believe its in-efficient to loop large df more than once. running through A once and marking B would be better way but not sure how to achieve that using isin(). Any other ways? especially ignoring case sensitivity of pattern.

Asked By: pen

||

Source

Answer 1

You can use something like this:

df2['check'] = df2['color'].apply(lambda x: True if any(x.casefold() in i.casefold() for i in df['items']) else False)

or you can use str.contains:

df2['check'] = df2['color'].str.contains('|'.join(df['items'].str.split(" ").str[1] + ' ' + df['items'].str.split(" ").str[2]),case=False)

#get second and third words

Answered By: Bushmaster

mark one dataframe if pattern found in another dataframe

Question:

Answers: