Join two pandas tables on multiple strings
Question:
I have two pandas dataframe : df1 and df2. df1 contains multiple emails of the customer and I want to match it with df2 to see how many customer did a test with the company by looking at if any of the emails is present in df1 is in df2.
I tried to do .str.split(";", expand=True)
to split the email ids and use pd.merge
to join on multiple email ids but it’s too lengthy. Posting it here to find a better solution.
df1
myid emails price
1001 [email protected];[email protected] 1
1002 [email protected] 2
1003 [email protected];[email protected];[email protected];[email protected] 8
1004 [email protected];[email protected] 7
1005 [email protected] 9
df2
tr_id latest_em test
101 [email protected]; [email protected] 12
102 [email protected] 13
103 [email protected] 16
104 [email protected] 18
105 [email protected];[email protected] 10
Expected Output :
myid emails price tr_id latest_em test
1004 [email protected];[email protected] 7 102 [email protected] 13
1004 [email protected];[email protected] 7 105 [email protected];[email protected] 10
1005 [email protected] 9 103 [email protected] 16
Answers:
You can split
, explode
, then merge
:
(df1
.assign(key=df1['emails'].str.split(';s*')).explode('key')
.merge(df2.assign(key=df2['latest_em'].str.split(';s*')).explode('key'),
on='key'
)
.drop(columns='key')
)
output:
myid emails price tr_id latest_em test
0 1004 [email protected];[email protected] 7 102 [email protected] 13
1 1004 [email protected];[email protected] 7 105 [email protected];[email protected] 10
2 1005 [email protected] 9 103 [email protected] 16
I have two pandas dataframe : df1 and df2. df1 contains multiple emails of the customer and I want to match it with df2 to see how many customer did a test with the company by looking at if any of the emails is present in df1 is in df2.
I tried to do .str.split(";", expand=True)
to split the email ids and use pd.merge
to join on multiple email ids but it’s too lengthy. Posting it here to find a better solution.
df1
myid emails price
1001 [email protected];[email protected] 1
1002 [email protected] 2
1003 [email protected];[email protected];[email protected];[email protected] 8
1004 [email protected];[email protected] 7
1005 [email protected] 9
df2
tr_id latest_em test
101 [email protected]; [email protected] 12
102 [email protected] 13
103 [email protected] 16
104 [email protected] 18
105 [email protected];[email protected] 10
Expected Output :
myid emails price tr_id latest_em test
1004 [email protected];[email protected] 7 102 [email protected] 13
1004 [email protected];[email protected] 7 105 [email protected];[email protected] 10
1005 [email protected] 9 103 [email protected] 16
You can split
, explode
, then merge
:
(df1
.assign(key=df1['emails'].str.split(';s*')).explode('key')
.merge(df2.assign(key=df2['latest_em'].str.split(';s*')).explode('key'),
on='key'
)
.drop(columns='key')
)
output:
myid emails price tr_id latest_em test
0 1004 [email protected];[email protected] 7 102 [email protected] 13
1 1004 [email protected];[email protected] 7 105 [email protected];[email protected] 10
2 1005 [email protected] 9 103 [email protected] 16