Subsetting a dataframe greater than a specific date Pandas and append to another dataframe

Question:

I have a dataframe (df1) with a date column with the dd/mm/yyyy date format.

I have second dataframe (df2) with the same structure, however, has some shared data. I want to add the data from df2 to df1 for the data is after the most recent date in df1.

My approach was to find the maxdate in df1 and then look for dates in df2 subset and append to df1

maxdate = df.loc[pd.to_datetime(df['DATE'],dayfirst=True).idxmax(), 'DATE']
# in this instance it is 11/09/2022 (dd/mm/yyyy)

df3 = df2.loc[pd.to_datetime(df2['DATE']) > maxdate] #this is to by my subset to append to df1

some of the df1 below

            DATE      TIME            X            Y             Z
3692  23/08/2022  16:55:00  734154.2872  9551189.353  2.845237e+03
3693  23/08/2022  16:55:00  734199.2516  9551070.666  2.842993e+03
3694  23/08/2022  05:02:00  734669.6130  9551361.865  2.845012e+03
3695  24/08/2022  17:25:00  734215.9910  9551068.295  2.842111e+03
3696  24/08/2022  17:25:00  734684.8444  9551383.618  2.846049e+03
3697  27/08/2022  17:20:00  734214.1851  9551061.242  2.841501e+03
3698  28/08/2022  17:00:00  734669.6130  9551361.865  2.845012e+03
3699  30/08/2022  05:25:00  734176.3412  9551168.550  2.844325e+03
3700  01/09/2022  17:18:00  734686.1061  9551385.420  2.846083e+03
3701  01/09/2022  17:18:00  734667.0922  9551358.264  2.844812e+03
3702  01/09/2022  17:18:00  734164.7047  9551178.039  2.844962e+03
3703  02/09/2022  17:16:00  734151.9079  9551185.951  2.845472e+03
3704  03/09/2022  17:15:00  734141.2542  9551197.062  2.844747e+03
3705  04/09/2022  17:08:00  734687.3678  9551387.222  2.846116e+03
3706  04/09/2022  17:08:00  734665.8319  9551356.464  2.844713e+03
3707  05/09/2022  05:08:00  734704.3326  9551376.581  2.842331e+03
3708  07/09/2022  16:58:00  734687.3678  9551387.222  2.846116e+03
3709  08/09/2022  16:55:00  734663.3109  9551352.864  2.844512e+03
3710  10/09/2022  17:03:00  734689.8913  9551390.826  2.846184e+03
3711  11/09/2022  17:13:00  734691.1530  9551392.628  9.551393e+06

some of df2 below

         DATE      TIME            X             Y             Z
134  23/08/22  16:55:00  734154.2872  9551189.3534      2845.237
135  23/08/22  16:55:00  734199.2516  9551070.6664     2842.9929
136  23/08/22   5:02:00   734669.613  9551361.8645     2845.0122
138  24/08/22  17:25:00   734215.991  9551068.2954     2842.1106
139  24/08/22  17:25:00  734684.8444   9551383.618     2846.0492
147  27/08/22  17:20:00  734214.1851  9551061.2423      2841.501
149  28/08/22  17:00:00   734669.613  9551361.8645     2845.0122
151  29/08/22  17:30:00            -             -             -
153  30/08/22   5:25:00  734176.3412  9551168.5498      2844.325
180  11/09/22  17:13:00   734691.153  9551392.6276  9551392.6276

However df3 is sub setting the dataframe that includes dates before the "maxdate"

I feel it is related to the date format that I have.

any help appreciated.

Asked By: Spooked

||

Answers:

You need to convert the values to pandas DateTime, else the comparison will be based on string values and not the dates, also its not clear if 11 is the day or 09 is the day in sample max date 11/09/2022, if 11 is the day, you also need to pass dayfirst=True to pd.to_datetime:

>>> maxdate=pd.to_datetime('11/09/2022')
# Timestamp('2022-11-09 00:00:00')

>>> df2 = df.loc[pd.to_datetime(df['DATE'], dayfirst=True) > maxdate]

Here is the execution for the sample data you have added to the question:

# Getting the max date from first dataframe
max_date=pd.to_datetime(df1['DATE'],dayfirst=True).max()
max_date
Timestamp('2022-09-11 00:00:00')

# Filtering second dataframe based on maximum date
df2[pd.to_datetime(df2['DATE'], dayfirst=True)>max_date]
Empty DataFrame
Columns: [DATE, TIME, X, Y, Z]
Index: []
# Result is empty dataframe for the sample data cause no record matches condition

# Records for maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)==max_date]
         DATE      TIME           X             Y             Z
180  11/09/22  17:13:00  734691.153  9551392.6276  9551392.6276


# Records for dates older than the maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)<max_date]
         DATE      TIME            X             Y          Z
134  23/08/22  16:55:00  734154.2872  9551189.3534   2845.237
135  23/08/22  16:55:00  734199.2516  9551070.6664  2842.9929
136  23/08/22   5:02:00   734669.613  9551361.8645  2845.0122
138  24/08/22  17:25:00   734215.991  9551068.2954  2842.1106
139  24/08/22  17:25:00  734684.8444   9551383.618  2846.0492
147  27/08/22  17:20:00  734214.1851  9551061.2423   2841.501
149  28/08/22  17:00:00   734669.613  9551361.8645  2845.0122
151  29/08/22  17:30:00            -             -          -
153  30/08/22   5:25:00  734176.3412  9551168.5498   2844.325
Answered By: ThePyGuy
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.