Remove rows from one dataframe that is present in another dataframe depending on specific columns

Question:

I have two dataframes as shown below:

df1 =
      A     B    val1     val2   val3
     
0     0    45     20      200      100       
1     3    100    10      80       50      
2     4    30     30      60       10  
3     6    150    50      34       11
df2 =
      A     B    val1     val2   val3
     
0     4    30     80      145      90      
1     0    45     10      80       50      
2     1    78     10      90       18  

I want to remove the rows from df1 that are present in df2 only depending on the columns A and B. From df1, the values 0 & 45 and 4 & 30 are present in df2 in columns A and B. So I want the entire row from df1 to be removed. The final dataframe should be as follows:

final = 
      A     B    val1     val2   val3
     
       
0     3    100    10      80       50        
1     6    150    50      34       11

The index should be reset to 0 and 1 in the final dataframe
Is there a way to do this?

Asked By: GGEng

||

Answers:

Filter columns names for df2 and add indicator parameter to DataFrame.merge with left join, test column _merge and filter in boolean indexing, last add DataFrame.reset_index:

df = (df1[df1.merge(df2[['A','B']], indicator=True, how='left')['_merge'].eq('left_only')]
         .reset_index(drop=True))
print (df)
   A    B  val1  val2  val3
1  3  100    10    80    50
3  6  150    50    34    11

Or create MultiIndex and use Index.isin with invert mask:

df = (df1[~df1.set_index(['A','B']).index.isin(df2.set_index(['A','B']).index)]
           .reset_index(drop=True)))
print (df)
   A    B  val1  val2  val3
1  3  100    10    80    50
3  6  150    50    34    11
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.