Compare two Dataframe columns and retrive the mismatched columns in a new column
Question:
I am trying to compare two dataframes and create one column in the first dataframe says that mismatched columns. attaching the Dataframe images and expected output images below. Could I get some help here
enter image description here
attaching the expected outputs here.Need one column says that names of all mismatched columns
Answers:
Assuming aligned indices, you can use:
df1['mismatch'] = df1.ne(df2).dot(df1.columns+',').str[:-1]
Alternatively:
out = (df1.ne(df2).reset_index().melt('index')
.query('value').groupby('index')
['variable'].agg(','.join)
)
df1['mismatch'] = out
Example output:
A B C mismatch
0 1 2 3 A,B
1 4 5 6 A
2 7 8 9
Used inputs:
# df1
A B C
0 1 2 3
1 4 5 6
2 7 8 9
# df2
A B C
0 0 0 3
1 0 5 6
2 7 8 9
Applying two lambda loops first to check if cell values equal or not second to find out the column names with true value and join them.
Alternatively, instead of second lambda, You can even use .dot()
. as @mozway has done.
Code:
df1.apply(lambda x: x!=df2.iloc[x.index][x.name]).apply(lambda x: ','.join(x.index[x]), axis=1)
I am trying to compare two dataframes and create one column in the first dataframe says that mismatched columns. attaching the Dataframe images and expected output images below. Could I get some help here
enter image description here
attaching the expected outputs here.Need one column says that names of all mismatched columns
Assuming aligned indices, you can use:
df1['mismatch'] = df1.ne(df2).dot(df1.columns+',').str[:-1]
Alternatively:
out = (df1.ne(df2).reset_index().melt('index')
.query('value').groupby('index')
['variable'].agg(','.join)
)
df1['mismatch'] = out
Example output:
A B C mismatch
0 1 2 3 A,B
1 4 5 6 A
2 7 8 9
Used inputs:
# df1
A B C
0 1 2 3
1 4 5 6
2 7 8 9
# df2
A B C
0 0 0 3
1 0 5 6
2 7 8 9
Applying two lambda loops first to check if cell values equal or not second to find out the column names with true value and join them.
Alternatively, instead of second lambda, You can even use .dot()
. as @mozway has done.
Code:
df1.apply(lambda x: x!=df2.iloc[x.index][x.name]).apply(lambda x: ','.join(x.index[x]), axis=1)