Join two Pandas dataframes with new column containing combined matching results

Question

Apologies if this has been answered already, but I wasn’t able to find a similar post.

I’ve got two Pandas dataframes that I’d like to merge. Dataframe1 contains data which has failed validation. Dataframe2 contains the detail for each row where the errors have occurred (ErrorColumn).

As you can see in Dataframe2, there can be multiple errors for a single row. I need to consolidate the errors, then append them as a new column (ErrorColumn) in Dataframe1.

Example below

Dataframe 1:

ErrorRow	MaterialID	Description	UnitCost	Quantity	Critical	Location
3	nan	Part 1	nan	100	false	West
4	nan	Part 2	12	nan	true	East
7	56779	Part 3	25	nan	false	West

Dataframe 2:

ErrorRow	ErrorColumn
3	MaterialID
3	UnitCost
4	MaterialID
4	Quantity
7	Quantity

Result:

ErrorRow	MaterialID	Description	UnitCost	Quantity	Critical	Location	ErrorColumn
3	nan	Part 1	nan	100	false	West	MaterialID, UnitCost
4	nan	Part 2	12	nan	true	East	MaterialID, Quantity
7	56779	Part 3	25	nan	false	West	Quantity

Any assistance is appreciated. I’m new to Python, there’s likely a simple solution that I have yet to find/learn.

Asked By: TW3

||

Source

Answer 1

You can use pandas.DataFrame.merge with GroupBy.agg :

out = df1.merge(df2.groupby("ErrorRow", as_index=False).agg(", ".join), on="ErrorRow")
#or if set needed, use GroupBy.agg(set)

# Output :

print(out.to_string())

   ErrorRow  MaterialID Description  UnitCost  Quantity  Critical Location           ErrorColumn
0         3         NaN      Part 1       NaN     100.0     False     West  MaterialID, UnitCost
1         4         NaN      Part 2      12.0       NaN      True     East  MaterialID, Quantity
2         7     56779.0      Part 3      25.0       NaN     False     West              Quantity

Answered By: abokey

Join two Pandas dataframes with new column containing combined matching results

Question:

Answers:

# Output :