How to perform a pandas merge on two tables where the key may be in either one of two columns?

Question:

Here is the situation:
I have two pandas data frames:

TABLE 1:

name alias col3
str str str

TABLE 2:

name_or_alias col2
str str
  • table1.name and table1.alias all contain unique values. Meaning, there are no duplicates between either of the two columns.

I need to do a left join on table2, but the problem is that the column to join on may be either table1.name OR table1.alias.

So, if I do:

table2.merge(table2, how=left, on=name),

I will only get some of the matches. If I do:

table2.merge(table2, how=left, on=alias),

I will also only get some of the matches. I need to figure out how to do a sort of IF statement where I first check one column for a match and then check the other column. I tried looking for ways to merge on two separate columns in pandas but I cannot find any.

Asked By: strawman_00

||

Answers:

Use two merge for each column then concat the two output dataframes and finally remove duplicated index:

out = pd.concat([df1.merge(df2, how='left', left_on='name', right_on='name_or_alias'),
                 df1.merge(df2, how='left', left_on='alias', right_on='name_or_alias')],
                axis=0).pipe(lambda x: x[x.index.duplicated()])
print(out)

# Output
  name alias col3 name_or_alias col2
0  str   str  str           str  str
Answered By: Corralien
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.