Find Yesterday's High Price by Merging Two DF's on Datetime and Date Columns

Question:

I’m trying to merge two df’s, one df has a datetime column, and the other has just a date column. My application for this is to find yesterday’s high price using an OHLC dataset. I’ve attached some starter code below, but I’ll describe what I’m looking for.

Given this intraday dataset:

                  time  current_intraday_high
0  2022-02-11 09:00:00                      1
1  2022-02-11 10:00:00                      2
2  2022-02-11 11:00:00                      3
3  2022-02-11 12:00:00                      4
4  2022-02-11 13:00:00                      5
5  2022-02-14 09:00:00                      6
6  2022-02-14 10:00:00                      7
7  2022-02-14 11:00:00                      8
8  2022-02-14 12:00:00                      9
9  2022-02-14 13:00:00                     10
10 2022-02-15 09:00:00                     11
11 2022-02-15 10:00:00                     12
12 2022-02-15 11:00:00                     13
13 2022-02-15 12:00:00                     14
14 2022-02-15 13:00:00                     15
15 2022-02-16 09:00:00                     16
16 2022-02-16 10:00:00                     17
17 2022-02-16 11:00:00                     18
18 2022-02-16 12:00:00                     19
19 2022-02-16 13:00:00                     20

…and this daily dataframe:

        time  daily_high
0 2022-02-11           5
1 2022-02-14          10
2 2022-02-15          15
3 2022-02-16          20

…how can I merge them together, and have each row of the intraday dataframe contain the previous (business) day’s high price, like so:

                  time  current_intraday_high  yesterdays_high
0  2022-02-11 09:00:00                      1              NaN
1  2022-02-11 10:00:00                      2              NaN
2  2022-02-11 11:00:00                      3              NaN
3  2022-02-11 12:00:00                      4              NaN
4  2022-02-11 13:00:00                      5              NaN
5  2022-02-14 09:00:00                      6              5.0
6  2022-02-14 10:00:00                      7              5.0
7  2022-02-14 11:00:00                      8              5.0
8  2022-02-14 12:00:00                      9              5.0
9  2022-02-14 13:00:00                     10              5.0
10 2022-02-15 09:00:00                     11             10.0
11 2022-02-15 10:00:00                     12             10.0
12 2022-02-15 11:00:00                     13             10.0
13 2022-02-15 12:00:00                     14             10.0
14 2022-02-15 13:00:00                     15             10.0
15 2022-02-16 09:00:00                     16             15.0
16 2022-02-16 10:00:00                     17             15.0
17 2022-02-16 11:00:00                     18             15.0
18 2022-02-16 12:00:00                     19             15.0
19 2022-02-16 13:00:00                     20             15.0

(Note the NaN‘s at the top because we don’t have any data for Feb 10, 2022 from the intraday dataset, and see how each row contains the intraday data, plus the PREVIOUS day’s max "high" price.)

Minimal reproducible example code below:

import pandas as pd


###################################################
# CREATE MOCK INTRADAY DATAFRAME
###################################################
intraday_date_time = [
    "2022-02-11 09:00:00",
                "2022-02-11 10:00:00",
                "2022-02-11 11:00:00",
                "2022-02-11 12:00:00",
                "2022-02-11 13:00:00",
    "2022-02-14 09:00:00",
                "2022-02-14 10:00:00",
                "2022-02-14 11:00:00",
                "2022-02-14 12:00:00",
                "2022-02-14 13:00:00",
    "2022-02-15 09:00:00",
                "2022-02-15 10:00:00",
                "2022-02-15 11:00:00",
                "2022-02-15 12:00:00",
                "2022-02-15 13:00:00",
    "2022-02-16 09:00:00",
                "2022-02-16 10:00:00",
                "2022-02-16 11:00:00",
                "2022-02-16 12:00:00",
                "2022-02-16 13:00:00",
]
intraday_date_time = pd.to_datetime(intraday_date_time)
intraday_df = pd.DataFrame(
    {
        "time": intraday_date_time,
        "current_intraday_high": [x for x in range(1, 21)],
    },
)
print(intraday_df)
# intraday_df.to_csv('intradayTEST.csv', index=True)


###################################################
# AGGREGATE/UPSAMPLE TO DAILY DATAFRAME
###################################################
# Aggregate to business days using intraday_df
agg_dict = {'current_intraday_high': 'max'}
daily_df = intraday_df.set_index('time').resample('B').agg(agg_dict).reset_index()
daily_df.rename(columns={"current_intraday_high": "daily_high"}, inplace=True)
print(daily_df)
# daily_df.to_csv('dailyTEST.csv', index=True)


###################################################
# MERGE THE TWO DATAFRAMES
###################################################
# Need to merge the daily dataset to the intraday dataset, such that,
# any row on the newly merged/joined/concat'd dataset will have:
# 1. The current intraday datetime in the 'time' column
# 2. The current 'intraday_high' value
# 3. The PREVIOUS DAY's 'daily_high' value
# This doesn't work as the daily_df just gets appended to the bottom
# of the intraday_df due to the datetimes/dates merging
merged_df = pd.merge(intraday_df, daily_df, how='outer', on='time')
print(merged_df)
Asked By: wildcat89

||

Answers:

Given your already existing code, you can map the shifted values:

intraday_df['yesterdays_high'] = (intraday_df['time']
 .dt.date
 .map(daily_df['daily_high'] 
      .set_axis(daily_df['time'].shift(-1)))
 )

If you don’t have all days and really want to map the real previous business day:

intraday_df['yesterdays_high'] = (intraday_df['time']
 .dt.date
 .map(daily_df['daily_high']
      .set_axis(daily_df['time'].add(pd.offsets.BusinessDay())))
)

Output:

                  time  current_intraday_high  yesterdays_high
0  2022-02-11 09:00:00                      1              NaN
1  2022-02-11 10:00:00                      2              NaN
2  2022-02-11 11:00:00                      3              NaN
3  2022-02-11 12:00:00                      4              NaN
4  2022-02-11 13:00:00                      5              NaN
5  2022-02-14 09:00:00                      6              5.0
6  2022-02-14 10:00:00                      7              5.0
7  2022-02-14 11:00:00                      8              5.0
8  2022-02-14 12:00:00                      9              5.0
9  2022-02-14 13:00:00                     10              5.0
10 2022-02-15 09:00:00                     11             10.0
11 2022-02-15 10:00:00                     12             10.0
12 2022-02-15 11:00:00                     13             10.0
13 2022-02-15 12:00:00                     14             10.0
14 2022-02-15 13:00:00                     15             10.0
15 2022-02-16 09:00:00                     16             15.0
16 2022-02-16 10:00:00                     17             15.0
17 2022-02-16 11:00:00                     18             15.0
18 2022-02-16 12:00:00                     19             15.0
19 2022-02-16 13:00:00                     20             15.0
Answered By: mozway

pd.merge_asof allows you to easily do a merge like this.

yesterdays_high = (intraday_df.resample('B', on='time')['current_intraday_high'].max()
                              .shift()
                              .rename('yesterdays_high')
                              .reset_index())

merged_df = pd.merge_asof(intraday_df, yesterdays_high)
print(merged_df)

Output:

                  time  current_intraday_high  yesterdays_high
0  2022-02-11 09:00:00                      1              NaN
1  2022-02-11 10:00:00                      2              NaN
2  2022-02-11 11:00:00                      3              NaN
3  2022-02-11 12:00:00                      4              NaN
4  2022-02-11 13:00:00                      5              NaN
5  2022-02-14 09:00:00                      6              5.0
6  2022-02-14 10:00:00                      7              5.0
7  2022-02-14 11:00:00                      8              5.0
8  2022-02-14 12:00:00                      9              5.0
9  2022-02-14 13:00:00                     10              5.0
10 2022-02-15 09:00:00                     11             10.0
11 2022-02-15 10:00:00                     12             10.0
12 2022-02-15 11:00:00                     13             10.0
13 2022-02-15 12:00:00                     14             10.0
14 2022-02-15 13:00:00                     15             10.0
15 2022-02-16 09:00:00                     16             15.0
16 2022-02-16 10:00:00                     17             15.0
17 2022-02-16 11:00:00                     18             15.0
18 2022-02-16 12:00:00                     19             15.0
19 2022-02-16 13:00:00                     20             15.0
Answered By: BeRT2me

We can use .dt.date as an index to join two frames together on the same days. As of previous day hight_price, we can apply shift on daily_df:

intra_date = intraday_df['time'].dt.date
daily_date = daily_df['time'].dt.date

answer = intraday_df.set_index(intra_date).join(
    daily_df.set_index(daily_date)['daily_high'].shift()
).reset_index(drop=True)
Answered By: Vitalizzare
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.