Remove rows from one dataframe that is present in another dataframe depending on specific columns
Question:
I have two dataframes as shown below:
df1 =
A B val1 val2 val3
0 0 45 20 200 100
1 3 100 10 80 50
2 4 30 30 60 10
3 6 150 50 34 11
df2 =
A B val1 val2 val3
0 4 30 80 145 90
1 0 45 10 80 50
2 1 78 10 90 18
I want to remove the rows from df1
that are present in df2
only depending on the columns A
and B
. From df1
, the values 0 & 45 and 4 & 30 are present in df2
in columns A
and B
. So I want the entire row from df1
to be removed. The final dataframe should be as follows:
final =
A B val1 val2 val3
0 3 100 10 80 50
1 6 150 50 34 11
The index should be reset to 0 and 1 in the final dataframe
Is there a way to do this?
Answers:
Filter columns names for df2
and add indicator
parameter to DataFrame.merge
with left join, test column _merge
and filter in boolean indexing
, last add DataFrame.reset_index
:
df = (df1[df1.merge(df2[['A','B']], indicator=True, how='left')['_merge'].eq('left_only')]
.reset_index(drop=True))
print (df)
A B val1 val2 val3
1 3 100 10 80 50
3 6 150 50 34 11
Or create MultiIndex
and use Index.isin
with invert mask:
df = (df1[~df1.set_index(['A','B']).index.isin(df2.set_index(['A','B']).index)]
.reset_index(drop=True)))
print (df)
A B val1 val2 val3
1 3 100 10 80 50
3 6 150 50 34 11
I have two dataframes as shown below:
df1 =
A B val1 val2 val3
0 0 45 20 200 100
1 3 100 10 80 50
2 4 30 30 60 10
3 6 150 50 34 11
df2 =
A B val1 val2 val3
0 4 30 80 145 90
1 0 45 10 80 50
2 1 78 10 90 18
I want to remove the rows from df1
that are present in df2
only depending on the columns A
and B
. From df1
, the values 0 & 45 and 4 & 30 are present in df2
in columns A
and B
. So I want the entire row from df1
to be removed. The final dataframe should be as follows:
final =
A B val1 val2 val3
0 3 100 10 80 50
1 6 150 50 34 11
The index should be reset to 0 and 1 in the final dataframe
Is there a way to do this?
Filter columns names for df2
and add indicator
parameter to DataFrame.merge
with left join, test column _merge
and filter in boolean indexing
, last add DataFrame.reset_index
:
df = (df1[df1.merge(df2[['A','B']], indicator=True, how='left')['_merge'].eq('left_only')]
.reset_index(drop=True))
print (df)
A B val1 val2 val3
1 3 100 10 80 50
3 6 150 50 34 11
Or create MultiIndex
and use Index.isin
with invert mask:
df = (df1[~df1.set_index(['A','B']).index.isin(df2.set_index(['A','B']).index)]
.reset_index(drop=True)))
print (df)
A B val1 val2 val3
1 3 100 10 80 50
3 6 150 50 34 11