Merge data with timestamp without overriding old data

Question:

I have data A that looks like this:

timestamp,some_value
389434893,abc
348973493,dac
128197291,fgd

I have other data B that is the newer version of A (with more data):

timestamp,some_value
389434893,wwwwwwe # timestamp DID NOT CHANGE
348973493,wwwwags # timestamp DID NOT CHANGE
128197291,wwaswww # timestamp DID NOT CHANGE
982379283,ggg

This data exists in the form of pandas.DataFrame.

I want to merge A with B without affecting old rows from A, even if some_value has been changed. Result R should look like this:

timestamp,some_value
389434893,abc # copied from A
348973493,dac # copied from A
128197291,fgd # copied from A
982379283,ggg # new row from B

Order is guaranteed.

What pandas methods should I use to achieve this?

Asked By: comonadd

||

Answers:

To merge two pandas DataFrames A and B without overwriting the old rows from A, you can use the pd.merge function with the following parameters:

R = pd.merge(A, B, on='timestamp', how='outer', suffixes=('_A', '_B'))

This will perform an outer merge, which means that it will include all rows from both A and B, and it will add null values for any columns that are not present in one of the DataFrames. The suffixes parameter specifies the suffixes to be appended to the column names in A and B in the resulting DataFrame R.

To keep the values from A and only include the new rows from B, you can then use the following code to filter R:

R = R[R['some_value_A'].isnull()]

This will keep only the rows where some_value_A is null, which indicates that the row is new and was not present in A.

Finally, you can drop the some_value_B column and rename some_value_A to some_value to get the desired result:

R = R.drop(columns=['some_value_B'])
R = R.rename(columns={'some_value_A': 'some_value'})
Answered By: testing0s
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.