Removing rows from dataframe that occurs in another dataframe
Question:
I have a problem with removing rows from dataframe that occurs in another dataframe.
Below simple example and expected results.
df1
A
B
Z
1
X
2
C
3
V
4
df2
A
B
DD
66
Z
1
X
2
CC
55
Expected output, df2 but rows that occur in df1 are dropped.
new df2:
A
B
DD
66
CC
55
Edit: I need to match both A and B.
Answers:
IIUC, you can use a reverse merge
with help of indicator=True
:
(df2
.merge(df1, how='left', indicator=True) # if unrelated columns use on=['A', 'B']
.loc[lambda d: d.pop('_merge').eq('left_only')]
)
output:
A B
0 DD 66
3 CC 55
use pandasql:
df2.sql("select * from self where not exists (select 1 from df1 where df1.A=self.A and df1.B=self.B)",df1=df1)
output:
A B
0 DD 66
3 CC 55
I have a problem with removing rows from dataframe that occurs in another dataframe.
Below simple example and expected results.
df1
A | B |
---|---|
Z | 1 |
X | 2 |
C | 3 |
V | 4 |
df2
A | B |
---|---|
DD | 66 |
Z | 1 |
X | 2 |
CC | 55 |
Expected output, df2 but rows that occur in df1 are dropped.
new df2:
A | B |
---|---|
DD | 66 |
CC | 55 |
Edit: I need to match both A and B.
IIUC, you can use a reverse merge
with help of indicator=True
:
(df2
.merge(df1, how='left', indicator=True) # if unrelated columns use on=['A', 'B']
.loc[lambda d: d.pop('_merge').eq('left_only')]
)
output:
A B
0 DD 66
3 CC 55
use pandasql:
df2.sql("select * from self where not exists (select 1 from df1 where df1.A=self.A and df1.B=self.B)",df1=df1)
output:
A B
0 DD 66
3 CC 55