how to get result of expanding/resample function in original dataframe using python

Question:

have a dataframe with 1 minute timestamp of open, high, low, close, volume for a token.
using expanding or resample function, one can get a new dataframe based on the timeinterval. in my case its 1 day time interval.

i am looking to get the above output in the original dataframe. please assist in the same.

original dataframe:

enter image description here

desired dataframe:

enter image description here

Here "date_1d" is the time interval for my use case. i used expanding function but as the value changes in "date_1d" column, expanding function works on the whole dataframe

df["high_1d"] =  df["high"].expanding().max()

df["low_1d"] =  df["low"].expanding().min()

df["volume_1d"] =  df["volume"].expanding().min()

then the next challenge was how to find Open and Close based on "date_1d" column

Please assist or ask more questions, if not clear on my desired output.

Fyi – data is huge for 5 years 1 minute data for 100 tokens

thanks in advance

Sukhwant

Asked By: sukhwant singh

||

Answers:

I’m not sure if I understand it right but for me it looks like you want to groupbyeach day and calculate first last min max for them.
Is the column date_1d already there ?

If not:

df["date_1d"] = df["date"].dt.strftime('%Y%m%d')

For the calculations:

df["open_1d"] = df.groupby("date_1d")['open'].transform('first')
df["high_1d"] =  df.groupby("date_1d")['high'].transform('max')
df["low_1d"] =  df.groupby("date_1d")['low'].transform('min')
df["close_1d"] = df.groupby("date_1d")['close'].transform('last')

EDIT:
Have a look in your code if this works like you expect it (till we have some of your data I can only guess, sorry 😀 )

df['high_1d'] = df.groupby('date_1d')['high'].expanding().max().values

It groups the data per "date_1d" but in the group only consider row by row (and the above rows)

Answered By: Rabinzel

EDIT: Found a neat solution using transform method. Erased the need for a "Day" Column as df.groupby is made using index.date attribute.

import pandas as pd
import yfinance as yf

df = yf.download("AAPL", interval="1m",
                 start=datetime.date.today()-datetime.timedelta(days=6))
df['Open_1d'] = df["Open"].groupby(
    df.index.day).transform('first')
df['Close_1d'] = df["Close"].groupby(
    df.index.day).transform('last')
df['High_1d'] = df['High'].groupby(
    df.index.day).expanding().max().droplevel(level=0)
df['Low_1d'] = df['Low'].groupby(
    df.index.day).expanding().min().droplevel(level=0)
df['Volume_1d'] = df['Volume'].groupby(
    df['Day']).expanding().sum().droplevel(level=0)
df

Happy Coding!

Answered By: Daniel Gonçalves
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.