find values dataframe pandas

Question:

I have two dataframes:
df1:

id  name   city   age  found_in_df2
2   john   ny     20
4   maria  lima   9
9   susan  cana   17

df2:

comment           question            response
i'm john          are you there       yes, i am
hello!            jajajajaja          go ahead!
please, go on     hello susan         no, i don't
maria             i'm 9 years         sure

I want to find values from column called ‘name’ from df1 that exist in any column in df2 (this value could be in any column and even inside a phrase). If the value is found, so I want to write ‘yes’ in column ‘found_in_df2’ of df1.
Someone could help me with that?

Asked By: Rafa

||

Answers:

The below seems to solve the problem

import pandas as pd

def find_value(data, phrase):
    for col in data.columns:
        if any(data[col].str.contains(phrase)):
            return 'yes'
    return 'no'


df1 = pd.DataFrame([[2, 'john', 'ny', 20], 
                    [4, 'maria', 'lima', 9], 
                    [9, 'susan', 'cana', 17]],
                   columns=['id', 'name', 'city', 'age'])

df2 = pd.DataFrame([["i'm john", "are you there", "yes, i am"], 
                    ["hello!", "jajajajaja", "go ahead!"], 
                    ["please, go on", "hello susan", "no, i don't"],
                    ["maria", "i'm 9 years", "sure"]],
                   columns=['comment', 'question', 'response'])

df1['found_in_df2'] = df1.name.apply(lambda x: find_value(df2, x))
Answered By: SunStorm44

One approach could be as follows.

df1['found_in_df2'] = df1.name.apply(lambda x: any(df2.stack().str.contains(x)))
    .map({True:'yes',False:'no'})

print(df1)

   id   name  city  age found_in_df2
0   2   john    ny   20          yes
1   4  maria  lima    9          yes
2   9  susan  cana   17          yes
Answered By: ouroboros1

Use:

df1['match'] =
     df1['name'].str.contains('|'.join(df2.stack().tolist()).map({True:'yes', False:'no'})
Answered By: ansev
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.