Compare values of two different DataFrames

Question

I have two DataFrames, both have the same columns but one is for historic data and the other for ‘new’ data. New data may sometimes contain info that is already in historic data. So I want to say if the value of ‘comment_id’ in new data is already present in historic data, no nothing. Else, add that row to historic data.

I tried doing this:

historic_comments = [x for x in filtered_comments if filtered_comments['comment_id'] not in historic_comments['comment_id']]

But got error:

TypeError: unhashable type: ‘Series’

Asked By: Palalfredo

||

Source

Answer 1

Use boolean mask and isin:

m = ~filtered_comments['comment_id'].isin(historic_comments['comment_id'])
out = pd.concat([historic_comments, filtered_comments[m]], axis=0, ignore_index=True)

Output:

>>> out  # new historic_comments dataframe
  comment_id
0    bonjour
1      hello
2      world
3        new

>>> filtered_comments
  comment_id
0      hello
1        new
2      world

>>> historic_comments
  comment_id
0    bonjour
1      hello
2      world

Answered By: Corralien

Answer 2

I think this is what you can do assuming historic_df is old df and new_df is new df

historic_df = pd.concat(
    [historic_df, new_df.loc[~new_df["comment_id"].isin(historic_df["comment_id"])]],
    ignore_index=True,
)

Answered By: SomeDude

Compare values of two different DataFrames

Question:

Answers: