Join two pandas tables on multiple strings

Question:

I have two pandas dataframe : df1 and df2. df1 contains multiple emails of the customer and I want to match it with df2 to see how many customer did a test with the company by looking at if any of the emails is present in df1 is in df2.

I tried to do .str.split(";", expand=True) to split the email ids and use pd.merge to join on multiple email ids but it’s too lengthy. Posting it here to find a better solution.

df1

myid      emails                                                                      price

1001     [email protected];[email protected]                                        1
1002     [email protected]                                                           2
1003     [email protected];[email protected];[email protected];[email protected]          8           
1004     [email protected];[email protected]                                                  7
1005     [email protected]                                                            9

df2

tr_id      latest_em                                     test

101     [email protected]; [email protected]                  12                            
102     [email protected]                                     13            
103     [email protected]                                16
104     [email protected]                              18                               
105     [email protected];[email protected]                     10                                           

Expected Output :

myid      emails                      price   tr_id   latest_em                      test
1004     [email protected];[email protected]  7      102     [email protected]                   13
1004     [email protected];[email protected]  7      105     [email protected];[email protected]   10
1005     [email protected]            9      103     [email protected]              16
Asked By: Ash

||

Answers:

You can split, explode, then merge:

(df1
 .assign(key=df1['emails'].str.split(';s*')).explode('key')
 .merge(df2.assign(key=df2['latest_em'].str.split(';s*')).explode('key'),
        on='key'
       )
 .drop(columns='key')
)

output:

   myid                        emails  price  tr_id                      latest_em  test
0  1004  [email protected];[email protected]      7    102                  [email protected]    13
1  1004  [email protected];[email protected]      7    105  [email protected];[email protected]    10
2  1005            [email protected]      9    103             [email protected]    16
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.