Get the rows that are not shared between two data frames

Question:

I have two data frames with the exact same column, but one of them have 1000 rows (df1), and one of them 500 (df2). The rows of df2 are also found in the data frame df1, but I want the rows that are not.

For example, lets say this is df1:

          Gender       Age
1           F           43
3           F           56
33          M           76
476         F           30
810         M           29

and df2:

          Gender       Age
3           F           56
476         F           30

I want a new data frame, df3, that have the unshared rows:

          Gender       Age
1           F           43
33          M           76
810         M           29

How can I do that ?

Asked By: Programming Noob

||

Answers:

Use pd.Index.difference:

df3 = df1.loc[df1.index.difference(df2.index)]
Answered By: SomeDude

This has many ways.
I know 3 ways for it.

first way:

df = df1[~df1.index.isin(df2.index)]

second way:

left merge 2 dataframes and then filter rows that just in df1

third way:

Add a column to both dataframes that define the source and then concat two dataframes with axis=1

then countt each index in data frame and filter all index that seen 1 time and has a record with source=df1

finally:
Use from first way. It is very very faster

Answered By: Alireza

You can concatenate two tables and delete any rows that have duplicates:

df3 = pd.concat([df1, df2]).drop_duplicates(keep=False)

The keep parameter ask you if you want to keep the row that has duplicates or not. If keep = True, it will keep 1 copy of the rows. If false, delete all rows that have duplicates.

Answered By: Tomboyloychoy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.