error when using time in rolling function pandas

Question:

I am trying to calculate mean i.e moving average every 10sec of data; lets say 1 to 10sec, and 11sec to 20sec etc.

Is below right for this? I am getting error when using "60sec" in rolling function, I think it may be due to the "ltt" column which is of type string, I am converting it to datetime, but still the error is coming.

How to resolve this error? Also how to do the averaging for samples collected every 10sec. This is streaming data coming in, but for testing purpose, I am using the static data in record1.

import pandas as pd
import numpy as np

records1 = [
{'ltt': 'Mon Nov  7 12:12:05 2022', 'last': 258},
{'ltt': 'Mon Nov  7 12:12:05 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:07 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:08 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:09 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:10 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:11 2022', 'last': 261},
{'ltt': 'Mon Nov  7 12:12:12 2022', 'last': 262},
{'ltt': 'Mon Nov  7 12:12:12 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:14 2022', 'last': 258},
{'ltt': 'Mon Nov  7 12:12:15 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:16 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:17 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:18 2022', 'last': 258},
{'ltt': 'Mon Nov  7 12:12:19 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:20 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:21 2022', 'last': 260},
{'ltt': 'Mon Nov  7 12:12:22 2022', 'last': 258},
{'ltt': 'Mon Nov  7 12:12:23 2022', 'last': 259},
{'ltt': 'Mon Nov  7 12:12:24 2022', 'last': 260}
]

datalist = []

def strategy1(record):
    global datalist

    datalist.append(record)
    pandas_df = pd.DataFrame(datalist)
    pandas_df['ltt'] = pd.to_datetime(pandas_df['ltt'], format="%a %b %d %H:%M:%S %Y")

    pandas_df['hour'] = pandas_df['ltt'].dt.hour
    pandas_df['minute'] = pandas_df['ltt'].dt.minute
    pandas_df['second'] = pandas_df['ltt'].dt.second

    pandas_df['max'] = pandas_df.groupby('second')['last'].transform("max")

    pandas_df["ma_1min"] = (
        pandas_df.sort_values("ltt")
            .groupby(["hour", "minute"])["last"]
            .transform(lambda x: x.rolling('10sec', min_periods=1).mean())
    )

    print(pandas_df)
Asked By: vrreddy1234

||

Answers:

i don’t know how to exactly implement this in your code but i had a kind of similar problem where i had to group each day into 4 hour timeslots. so an approach might be something like this:

pandas_df.groupby([pandas_df['ltt'].dt.hour, pandas_df['ltt'].dt.minute, (pandas_df['ltt'].dt.second / 10).astype(int)]).last.agg('mean')

this should basically give you 6 groups ([0s-9s -> 0], [10s-19s -> 1], etc. for the 3rd groupby index) for every minute of data.

Answered By: maxxel_
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.