DF1 is a subset of DF2 DF3 = DF2 – DF1 that will give rows that are not same store in df3

Question:

Dataframe 1 (df1):-

  date  L120_active_cohort_logins  L120_active_cohort  percentage_L120_active_cohort_logins
        0 2022-09-01                      32679              195345                             16.728865
        1 2022-09-02                      32938              196457                             16.766010
        2 2022-09-03                      40746              197586                             20.621906
        3 2022-09-04                      33979              198799                             17.092138

Dataframe 2(df2):-

         date  L120_active_cohort_logins  L120_active_cohort  percentage_L120_active_cohort_logins
0  2022-09-01                      32677              195345                             16.728864
1  2022-09-02                      32938              196457                             16.766010
2  2022-09-03                      40746              197586                             20.621906
3  2022-09-04                      33979              198799                             17.092138

result df3 = df2 – df1
I want df2 not matching with df1 particular row to be stored in df3

output :-

date  L120_active_cohort_logins  L120_active_cohort  percentage_L120_active_cohort_logins
0  2022-09-01                      32677              195345                             16.728864
Asked By: vaibhav

||

Answers:

reference Comparing two dataframes and getting the differences

new_df = pd.concat([df1,df2]).drop_duplicates(keep=False)
new_df[~new_df.index.duplicated(keep='last')]
Answered By: Stackpy

As you need to keep only the difference, You can do it this way

# I just added the full example
import pandas as pd

df1 = pd.DataFrame({
    'date':[ '2022-09-01',  '2022-09-02',   '2022-09-03',   '2022-09-04'
],
    'L120_active_cohort_logins':[32679  ,32938  ,40746, 33979],
    'L120_active_cohort':[195345,   196457, 197586, 198799],
    'percentage_L120_active_cohort_logins':[16.728865   ,16.76601   ,20.621906, 17.092138],
    })

df2 = pd.DataFrame({
    'date':[ '2022-09-01',  '2022-09-02',   '2022-09-03',   '2022-09-04'
],
    'L120_active_cohort_logins':[32677  ,32938  ,40746, 33979],
    'L120_active_cohort':[195345,   196457, 197586, 198799],
    'percentage_L120_active_cohort_logins':[16.728865   ,16.76601   ,20.621906, 17.092138],
    })

df3= pd.merge(df1, df2,how='outer').drop( pd.merge(df1, df2,left_index=True, right_index=True, how='inner').index)
print(df3)
Answered By: I_Al-thamary

This worked for me

    pd_df1  = pd.merge(click_df1, click_df2, on="L120_active_cohort_logins", how='outer', indicator='Exist')
    pd_df1  = pd_df1.loc[pd_df1['Exist'] != 'both']
    final_df = pd_df1[pd_df1['Exist'] == 'right_only'][['date_y','L120_active_cohort_logins','L120_active_cohort_y','percentage_L120_active_cohort_logins_y']]
    columns = ['date','L120_active_cohort_logins','L120_active_cohort','percentage_L120_active_cohort_logins']
    final_df.columns = columns
Answered By: vaibhav
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.