Join two Pandas dataframes with new column containing combined matching results

Question:

Apologies if this has been answered already, but I wasn’t able to find a similar post.

I’ve got two Pandas dataframes that I’d like to merge. Dataframe1 contains data which has failed validation. Dataframe2 contains the detail for each row where the errors have occurred (ErrorColumn).

As you can see in Dataframe2, there can be multiple errors for a single row. I need to consolidate the errors, then append them as a new column (ErrorColumn) in Dataframe1.

Example below

Dataframe 1:

ErrorRow MaterialID Description UnitCost Quantity Critical Location
3 nan Part 1 nan 100 false West
4 nan Part 2 12 nan true East
7 56779 Part 3 25 nan false West

Dataframe 2:

ErrorRow ErrorColumn
3 MaterialID
3 UnitCost
4 MaterialID
4 Quantity
7 Quantity

Result:

ErrorRow MaterialID Description UnitCost Quantity Critical Location ErrorColumn
3 nan Part 1 nan 100 false West MaterialID, UnitCost
4 nan Part 2 12 nan true East MaterialID, Quantity
7 56779 Part 3 25 nan false West Quantity

Any assistance is appreciated. I’m new to Python, there’s likely a simple solution that I have yet to find/learn.

Asked By: TW3

||

Answers:

You can use pandas.DataFrame.merge with GroupBy.agg :

out = df1.merge(df2.groupby("ErrorRow", as_index=False).agg(", ".join), on="ErrorRow")
#or if set needed, use GroupBy.agg(set)

# Output :

print(out.to_string())
​
   ErrorRow  MaterialID Description  UnitCost  Quantity  Critical Location           ErrorColumn
0         3         NaN      Part 1       NaN     100.0     False     West  MaterialID, UnitCost
1         4         NaN      Part 2      12.0       NaN      True     East  MaterialID, Quantity
2         7     56779.0      Part 3      25.0       NaN     False     West              Quantity
Answered By: abokey
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.