Check and compare if first 3 row value of dataframe for each day is greater than the value on corresponding column

Question:

I have a 5-minute time series dataframe with the titles Open and Pivot. The Pivot column value is the same throughout the day. I need to extract the first three 5-minute Open data and compare each of the values to the value on the pivot column and see if it is larger or not. If it is larger, then a 3rd column titled ‘Result’ would print 1 for the whole day’s length and 0 if otherwise.

Here’s how the dataframe looks like:

df image

As an example the first 3 values for the day of 2022-09-26 are 18803.900391, 18801.226562 and 18807.296875. Since two of the values are less than the value on the pivot column 18806.938151for the corresponding day, the result column would print 0 for the whole day.

A rough idea:

I was loosely thinking of something like this, but I know this is completely wrong:

for i in range(len(df)):
    df['result'] = df['Open'].iloc[i:i+3] > df['pivot'].iloc[i]
    i=i+288 #number of 5 min candles in a day to skip to next one

I can’t seen to find a way to iterate through this. Any idea or recommendation would help! Thankyou in advance!

Here’s my complete code to get the dataframe:

import yfinance as yf
import numpy as np
import pandas as pd
import datetime

df = yf.download(tickers='BTC-USD', period = '30d', interval = '5m')
df = df.reset_index()

#resetting df to start at midnight of next day
min_date = df.Datetime.min()
NextDay_Date = (min_date + datetime.timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
df = df[df.Datetime >= NextDay_Date].copy()
df = df.set_index('Datetime')

# resampled daily pivots merged to smaller timeframe
day_df = (df.resample('D')
               .agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}))
day_df['pivot'] = (df['High']+ df['Low'] + df['Close'])/3

day_df = day_df.reset_index()
day_df['Datetime'] = day_df['Datetime'].dt.strftime('%Y-%m-%d %H:%M:%S')
day_df['Datetime'] = pd.to_datetime(day_df['Datetime'])
day_df = day_df.set_index('Datetime')
day_df.drop(['Open', 'High', 'Low', 'Close'], axis=1, inplace=True)

#merging both dataframes
df = df.join(day_df)
df['pivot'].fillna(method='ffill', inplace=True)

df.drop(['High','Low','Close','Adj Close', 'Volume'], axis=1, inplace=True)

df
Asked By: Royce Anton Jose

||

Answers:

Here’s a basic solution:

df_s = df.reset_index()
df_s['result'] = 0 

start_day = -1

for i in range(len(df_s)):
    if df_s['Datetime'][i].day != start_day :
        start_day = df_s['Datetime'][i].day
        res = np.all(df_s['Open'][i:i+3] > df_s['pivot'][i])
    
    df_s['result'][i] = res

Since I don’t like multi-indexed dataframes, my first step was to treat as it an int indexed df where Datetime is just another column. Now, we keep track of 2 things – start_day and result and iterate as follows:

  • For each new start_day check if the first three Open values are greater than the corresponding pivot value, and store to result. i.e, only when the row entry has a different Datetime value, update the start_day and result values
  • then result value must be applied to each row

Now, some of the less obvious reasons why I call this a basic solution:

  • in case there are only 2 entries for a certain start_date, my code doesn’t check the dates (of the 3 entries) before comparing the Open and pivot values
  • the code assumes the df is sorted by date and takes into account only the day and not the month, so in case the dataset actually has missing entries for certain months, it may run into the same problem as above
  • A SettingwithCopy warning is raised by pandas

But I hope this should be enough to answer the OP’s question

Answered By: MtC
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.