Compare values of two different DataFrames

Question:

I have two DataFrames, both have the same columns but one is for historic data and the other for ‘new’ data. New data may sometimes contain info that is already in historic data. So I want to say if the value of ‘comment_id’ in new data is already present in historic data, no nothing. Else, add that row to historic data.

I tried doing this:

historic_comments = [x for x in filtered_comments if filtered_comments['comment_id'] not in historic_comments['comment_id']]

But got error:

TypeError: unhashable type: ‘Series’

Asked By: Palalfredo

||

Answers:

Use boolean mask and isin:

m = ~filtered_comments['comment_id'].isin(historic_comments['comment_id'])
out = pd.concat([historic_comments, filtered_comments[m]], axis=0, ignore_index=True)

Output:

>>> out  # new historic_comments dataframe
  comment_id
0    bonjour
1      hello
2      world
3        new

>>> filtered_comments
  comment_id
0      hello
1        new
2      world

>>> historic_comments
  comment_id
0    bonjour
1      hello
2      world
Answered By: Corralien

I think this is what you can do assuming historic_df is old df and new_df is new df

historic_df = pd.concat(
    [historic_df, new_df.loc[~new_df["comment_id"].isin(historic_df["comment_id"])]],
    ignore_index=True,
)
Answered By: SomeDude
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.