Merging pandas dataframes on potentially different join keys
Question:
I have a dataframe A with columns like so:
ACCOUNT_NAME
SFDC_ACCOUNT_NAME
COMPANY_NAME
Acme Inc
Acme, Inc.
Acme
Donut Heaven
None
Doughnut Heaven
Super Foods
Sooper Foods
None
I want to merge on additional columns but I am not sure if this additional data was captured using ACCOUNT_NAME
, SFDC_ACCOUNT_NAME
, or COMPANY_NAME
. This data looks like the table below. There is a join key column that could represent either ACCOUNT_NAME
, SFDC_ACCOUNT_NAME
, or COMPANY_NAME
.
CAPTURED_COMPANY_NAME
value1
value2
Acme Inc
2
3
Sooper Foods
6
7
Doughnut Heaven
5
8
I want the final table to look like this:
ACCOUNT_NAME
SFDC_ACCOUNT_NAME
COMPANY_NAME
value1
value2
Acme Inc
Acme, Inc.
Acme
2
3
Donut Heaven
None
Doughnut Heaven
5
8
Super Foods
Sooper Foods
None
6
7
I could merge this second dataset onto the first three times (one for each join key) but then of course columns value1
and value2
would be repeated three times. What is the best way to achieve this?
Answers:
Given:
# df
ACCOUNT_NAME SFDC_ACCOUNT_NAME COMPANY_NAME
0 Acme Inc Acme, Inc. Acme
1 Donut Heaven NaN Doughnut Heaven
2 Super Foods Sooper Foods NaN
# df1
CAPTURED_COMPANY_NAME value1 value2
0 Acme Inc 2 3
1 Sooper Foods 6 7
2 Doughnut Heaven 5 8
Doing:
# Merge each individually, and concat the results:
out = pd.concat([df.merge(df1, left_on=x, right_on='CAPTURED_COMPANY_NAME') for x in df.columns], ignore_index=True)
# Left Merge:
# out = df.merge(pd.concat([df.merge(df1, left_on=x, right_on='CAPTURED_COMPANY_NAME') for x in df.columns], ignore_index=True), how='left')
print(out)
Output:
ACCOUNT_NAME SFDC_ACCOUNT_NAME COMPANY_NAME CAPTURED_COMPANY_NAME value1 value2
0 Acme Inc Acme, Inc. Acme Acme Inc 2 3
1 Super Foods Sooper Foods NaN Sooper Foods 6 7
2 Donut Heaven NaN Doughnut Heaven Doughnut Heaven 5 8
I have a dataframe A with columns like so:
ACCOUNT_NAME | SFDC_ACCOUNT_NAME | COMPANY_NAME |
---|---|---|
Acme Inc | Acme, Inc. | Acme |
Donut Heaven | None | Doughnut Heaven |
Super Foods | Sooper Foods | None |
I want to merge on additional columns but I am not sure if this additional data was captured using ACCOUNT_NAME
, SFDC_ACCOUNT_NAME
, or COMPANY_NAME
. This data looks like the table below. There is a join key column that could represent either ACCOUNT_NAME
, SFDC_ACCOUNT_NAME
, or COMPANY_NAME
.
CAPTURED_COMPANY_NAME | value1 | value2 |
---|---|---|
Acme Inc | 2 | 3 |
Sooper Foods | 6 | 7 |
Doughnut Heaven | 5 | 8 |
I want the final table to look like this:
ACCOUNT_NAME | SFDC_ACCOUNT_NAME | COMPANY_NAME | value1 | value2 |
---|---|---|---|---|
Acme Inc | Acme, Inc. | Acme | 2 | 3 |
Donut Heaven | None | Doughnut Heaven | 5 | 8 |
Super Foods | Sooper Foods | None | 6 | 7 |
I could merge this second dataset onto the first three times (one for each join key) but then of course columns value1
and value2
would be repeated three times. What is the best way to achieve this?
Given:
# df
ACCOUNT_NAME SFDC_ACCOUNT_NAME COMPANY_NAME
0 Acme Inc Acme, Inc. Acme
1 Donut Heaven NaN Doughnut Heaven
2 Super Foods Sooper Foods NaN
# df1
CAPTURED_COMPANY_NAME value1 value2
0 Acme Inc 2 3
1 Sooper Foods 6 7
2 Doughnut Heaven 5 8
Doing:
# Merge each individually, and concat the results:
out = pd.concat([df.merge(df1, left_on=x, right_on='CAPTURED_COMPANY_NAME') for x in df.columns], ignore_index=True)
# Left Merge:
# out = df.merge(pd.concat([df.merge(df1, left_on=x, right_on='CAPTURED_COMPANY_NAME') for x in df.columns], ignore_index=True), how='left')
print(out)
Output:
ACCOUNT_NAME SFDC_ACCOUNT_NAME COMPANY_NAME CAPTURED_COMPANY_NAME value1 value2
0 Acme Inc Acme, Inc. Acme Acme Inc 2 3
1 Super Foods Sooper Foods NaN Sooper Foods 6 7
2 Donut Heaven NaN Doughnut Heaven Doughnut Heaven 5 8