cross check if two df have different values and print any if there

Question:

i have two df and i wanna check for the id if the value differs in both df if so i need to print those.

example:

df1 = |id |check_column1|
      |1|abc|
      |1|bcd|
      |2|xyz|
      |2|mno|
      |2|mmm|
df2 = 
      |id |check_column2|
      |1|bcd|
      |1|abc|
      |2|xyz|
      |2|mno|
      |2|kkk|

here the output should be just |2|mmm|kkk| but i am getting whole table as output since index are different

This is what i did

output = pd.merge(df1,df2, on= ['id'], how='inner')

event4 = output[output.apply(lambda x: x['check_column1'] != x['check_column2'], axis=1)]
Asked By: vidathri

||

Answers:

Idea is sorting values per id in both columns and join with helper counter by GroupBy.cumcount, then is possible filtering not matched rows:

df1 = df1.sort_values(['id','check_column1'])
df2 = df2.sort_values(['id','check_column2'])
    
df = pd.merge(df1,df2, left_on= ['id',df1.groupby('id').cumcount()], 
                       right_on= ['id',df2.groupby('id').cumcount()])

output = df[df['check_column1'] != df['check_column2']]
print (output)
   id  key_1 check_column1 check_column2
2   2      0           mmm           kkk
Answered By: jezrael
mask = np.where((df1['id'] != df2['id']) | (df1['check_column1'] != df2['check_column2']), True, False)

output = df2[mask]
Answered By: Ouroboroski

You can use np.where to achieve this.

df1 = pd.DataFrame({'id':[1,1,2,2,2],'check_column1':['abc','bcd','xyz','mno','mmm']})
df2 = pd.DataFrame({'id':[1,1,2,2,2],'check_column2':['bcd','abc','xyz','mno','kkk']})

output = pd.merge(df1,df2, on= ['id'], how='inner')
event4 = np.where(output['check_column1']!=output['check_column2'],output[['id','check_column1']],output[['id','check_column2']])

Output:

array([[2, 'mmm'],
       [2, 'kkk']], dtype=object)
Answered By: TRBot