Subsetting a dataframe greater than a specific date Pandas and append to another dataframe
Question:
I have a dataframe (df1) with a date column with the dd/mm/yyyy date format.
I have second dataframe (df2) with the same structure, however, has some shared data. I want to add the data from df2 to df1 for the data is after the most recent date in df1.
My approach was to find the maxdate in df1 and then look for dates in df2 subset and append to df1
maxdate = df.loc[pd.to_datetime(df['DATE'],dayfirst=True).idxmax(), 'DATE']
# in this instance it is 11/09/2022 (dd/mm/yyyy)
df3 = df2.loc[pd.to_datetime(df2['DATE']) > maxdate] #this is to by my subset to append to df1
some of the df1 below
DATE TIME X Y Z
3692 23/08/2022 16:55:00 734154.2872 9551189.353 2.845237e+03
3693 23/08/2022 16:55:00 734199.2516 9551070.666 2.842993e+03
3694 23/08/2022 05:02:00 734669.6130 9551361.865 2.845012e+03
3695 24/08/2022 17:25:00 734215.9910 9551068.295 2.842111e+03
3696 24/08/2022 17:25:00 734684.8444 9551383.618 2.846049e+03
3697 27/08/2022 17:20:00 734214.1851 9551061.242 2.841501e+03
3698 28/08/2022 17:00:00 734669.6130 9551361.865 2.845012e+03
3699 30/08/2022 05:25:00 734176.3412 9551168.550 2.844325e+03
3700 01/09/2022 17:18:00 734686.1061 9551385.420 2.846083e+03
3701 01/09/2022 17:18:00 734667.0922 9551358.264 2.844812e+03
3702 01/09/2022 17:18:00 734164.7047 9551178.039 2.844962e+03
3703 02/09/2022 17:16:00 734151.9079 9551185.951 2.845472e+03
3704 03/09/2022 17:15:00 734141.2542 9551197.062 2.844747e+03
3705 04/09/2022 17:08:00 734687.3678 9551387.222 2.846116e+03
3706 04/09/2022 17:08:00 734665.8319 9551356.464 2.844713e+03
3707 05/09/2022 05:08:00 734704.3326 9551376.581 2.842331e+03
3708 07/09/2022 16:58:00 734687.3678 9551387.222 2.846116e+03
3709 08/09/2022 16:55:00 734663.3109 9551352.864 2.844512e+03
3710 10/09/2022 17:03:00 734689.8913 9551390.826 2.846184e+03
3711 11/09/2022 17:13:00 734691.1530 9551392.628 9.551393e+06
some of df2 below
DATE TIME X Y Z
134 23/08/22 16:55:00 734154.2872 9551189.3534 2845.237
135 23/08/22 16:55:00 734199.2516 9551070.6664 2842.9929
136 23/08/22 5:02:00 734669.613 9551361.8645 2845.0122
138 24/08/22 17:25:00 734215.991 9551068.2954 2842.1106
139 24/08/22 17:25:00 734684.8444 9551383.618 2846.0492
147 27/08/22 17:20:00 734214.1851 9551061.2423 2841.501
149 28/08/22 17:00:00 734669.613 9551361.8645 2845.0122
151 29/08/22 17:30:00 - - -
153 30/08/22 5:25:00 734176.3412 9551168.5498 2844.325
180 11/09/22 17:13:00 734691.153 9551392.6276 9551392.6276
However df3 is sub setting the dataframe that includes dates before the "maxdate"
I feel it is related to the date format that I have.
any help appreciated.
Answers:
You need to convert the values to pandas DateTime, else the comparison will be based on string values and not the dates, also its not clear if 11 is the day or 09 is the day in sample max date 11/09/2022
, if 11
is the day, you also need to pass dayfirst=True
to pd.to_datetime
:
>>> maxdate=pd.to_datetime('11/09/2022')
# Timestamp('2022-11-09 00:00:00')
>>> df2 = df.loc[pd.to_datetime(df['DATE'], dayfirst=True) > maxdate]
Here is the execution for the sample data you have added to the question:
# Getting the max date from first dataframe
max_date=pd.to_datetime(df1['DATE'],dayfirst=True).max()
max_date
Timestamp('2022-09-11 00:00:00')
# Filtering second dataframe based on maximum date
df2[pd.to_datetime(df2['DATE'], dayfirst=True)>max_date]
Empty DataFrame
Columns: [DATE, TIME, X, Y, Z]
Index: []
# Result is empty dataframe for the sample data cause no record matches condition
# Records for maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)==max_date]
DATE TIME X Y Z
180 11/09/22 17:13:00 734691.153 9551392.6276 9551392.6276
# Records for dates older than the maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)<max_date]
DATE TIME X Y Z
134 23/08/22 16:55:00 734154.2872 9551189.3534 2845.237
135 23/08/22 16:55:00 734199.2516 9551070.6664 2842.9929
136 23/08/22 5:02:00 734669.613 9551361.8645 2845.0122
138 24/08/22 17:25:00 734215.991 9551068.2954 2842.1106
139 24/08/22 17:25:00 734684.8444 9551383.618 2846.0492
147 27/08/22 17:20:00 734214.1851 9551061.2423 2841.501
149 28/08/22 17:00:00 734669.613 9551361.8645 2845.0122
151 29/08/22 17:30:00 - - -
153 30/08/22 5:25:00 734176.3412 9551168.5498 2844.325
I have a dataframe (df1) with a date column with the dd/mm/yyyy date format.
I have second dataframe (df2) with the same structure, however, has some shared data. I want to add the data from df2 to df1 for the data is after the most recent date in df1.
My approach was to find the maxdate in df1 and then look for dates in df2 subset and append to df1
maxdate = df.loc[pd.to_datetime(df['DATE'],dayfirst=True).idxmax(), 'DATE']
# in this instance it is 11/09/2022 (dd/mm/yyyy)
df3 = df2.loc[pd.to_datetime(df2['DATE']) > maxdate] #this is to by my subset to append to df1
some of the df1 below
DATE TIME X Y Z
3692 23/08/2022 16:55:00 734154.2872 9551189.353 2.845237e+03
3693 23/08/2022 16:55:00 734199.2516 9551070.666 2.842993e+03
3694 23/08/2022 05:02:00 734669.6130 9551361.865 2.845012e+03
3695 24/08/2022 17:25:00 734215.9910 9551068.295 2.842111e+03
3696 24/08/2022 17:25:00 734684.8444 9551383.618 2.846049e+03
3697 27/08/2022 17:20:00 734214.1851 9551061.242 2.841501e+03
3698 28/08/2022 17:00:00 734669.6130 9551361.865 2.845012e+03
3699 30/08/2022 05:25:00 734176.3412 9551168.550 2.844325e+03
3700 01/09/2022 17:18:00 734686.1061 9551385.420 2.846083e+03
3701 01/09/2022 17:18:00 734667.0922 9551358.264 2.844812e+03
3702 01/09/2022 17:18:00 734164.7047 9551178.039 2.844962e+03
3703 02/09/2022 17:16:00 734151.9079 9551185.951 2.845472e+03
3704 03/09/2022 17:15:00 734141.2542 9551197.062 2.844747e+03
3705 04/09/2022 17:08:00 734687.3678 9551387.222 2.846116e+03
3706 04/09/2022 17:08:00 734665.8319 9551356.464 2.844713e+03
3707 05/09/2022 05:08:00 734704.3326 9551376.581 2.842331e+03
3708 07/09/2022 16:58:00 734687.3678 9551387.222 2.846116e+03
3709 08/09/2022 16:55:00 734663.3109 9551352.864 2.844512e+03
3710 10/09/2022 17:03:00 734689.8913 9551390.826 2.846184e+03
3711 11/09/2022 17:13:00 734691.1530 9551392.628 9.551393e+06
some of df2 below
DATE TIME X Y Z
134 23/08/22 16:55:00 734154.2872 9551189.3534 2845.237
135 23/08/22 16:55:00 734199.2516 9551070.6664 2842.9929
136 23/08/22 5:02:00 734669.613 9551361.8645 2845.0122
138 24/08/22 17:25:00 734215.991 9551068.2954 2842.1106
139 24/08/22 17:25:00 734684.8444 9551383.618 2846.0492
147 27/08/22 17:20:00 734214.1851 9551061.2423 2841.501
149 28/08/22 17:00:00 734669.613 9551361.8645 2845.0122
151 29/08/22 17:30:00 - - -
153 30/08/22 5:25:00 734176.3412 9551168.5498 2844.325
180 11/09/22 17:13:00 734691.153 9551392.6276 9551392.6276
However df3 is sub setting the dataframe that includes dates before the "maxdate"
I feel it is related to the date format that I have.
any help appreciated.
You need to convert the values to pandas DateTime, else the comparison will be based on string values and not the dates, also its not clear if 11 is the day or 09 is the day in sample max date 11/09/2022
, if 11
is the day, you also need to pass dayfirst=True
to pd.to_datetime
:
>>> maxdate=pd.to_datetime('11/09/2022')
# Timestamp('2022-11-09 00:00:00')
>>> df2 = df.loc[pd.to_datetime(df['DATE'], dayfirst=True) > maxdate]
Here is the execution for the sample data you have added to the question:
# Getting the max date from first dataframe
max_date=pd.to_datetime(df1['DATE'],dayfirst=True).max()
max_date
Timestamp('2022-09-11 00:00:00')
# Filtering second dataframe based on maximum date
df2[pd.to_datetime(df2['DATE'], dayfirst=True)>max_date]
Empty DataFrame
Columns: [DATE, TIME, X, Y, Z]
Index: []
# Result is empty dataframe for the sample data cause no record matches condition
# Records for maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)==max_date]
DATE TIME X Y Z
180 11/09/22 17:13:00 734691.153 9551392.6276 9551392.6276
# Records for dates older than the maximum date:
df2[pd.to_datetime(df2['DATE'], dayfirst=True)<max_date]
DATE TIME X Y Z
134 23/08/22 16:55:00 734154.2872 9551189.3534 2845.237
135 23/08/22 16:55:00 734199.2516 9551070.6664 2842.9929
136 23/08/22 5:02:00 734669.613 9551361.8645 2845.0122
138 24/08/22 17:25:00 734215.991 9551068.2954 2842.1106
139 24/08/22 17:25:00 734684.8444 9551383.618 2846.0492
147 27/08/22 17:20:00 734214.1851 9551061.2423 2841.501
149 28/08/22 17:00:00 734669.613 9551361.8645 2845.0122
151 29/08/22 17:30:00 - - -
153 30/08/22 5:25:00 734176.3412 9551168.5498 2844.325