Loop through time series data and collect specific timeframe window keeping the runtime O(N)
Question:
I am trying to loop through a time series data frame and for a specific time, I need to then go back 5 minutes and 10 minutes (need to make sure I also DO NOT over count the data because of multicollinearity) and check if a condition is met. Below is the code that I wrote, I would love for it to be in O(N) and not have to make two loops. I was thinking of saving the index somehow to save space but need help here.
Thanks in advance
Sorry this is not a great question
Answers:
I’m not entirely sure what’s going on in your code, but I believe it could be replaced with a single forward pass. Assuming your data is sorted.
last_short = filldata['time'][0] - np.timedelta64(9,'ms')
last_long = filldata['time'][0] - np.timedelta64(9,'ms')
for row in fillData.itertuples():
if row.fill == 1:
last_short = row.time + np.timedelta64(250,'ms')
last_long = row.time + np.timedelta64(3, 'm')
#filldata['250 mili'][row.Index] = 1 #you might want this line
else:
if row.time <= last_short:
filldata['250 mili'][row.Index] = 1
elif row.time <= last_long:#ifelif your choice
filldata['6 min'][row.Index] = 1
Or you could probably use the rolling
method of the dataframe.
Does this do what you want:
fillData.set_index('time', drop=True, inplace=True)
condition = fillData.fill.eq(1)
fillData['500 milli'] = (condition.rolling(pd.Timedelta('500ms'))
.agg(any)
.astype(int))
fillData['6 minutes'] = (condition.rolling(pd.Timedelta('6m'))
.agg(any)
.astype(int))
fillData['6 minutes'][fillData['500 milli'].eq(1)] = 0
fillData.reset_index(drop=False, inplace=True)
I’m not sure how fillData
is sorted. My assumption is that the sorting is ascending (in time). Otherwise you have to reverse it.
I am trying to loop through a time series data frame and for a specific time, I need to then go back 5 minutes and 10 minutes (need to make sure I also DO NOT over count the data because of multicollinearity) and check if a condition is met. Below is the code that I wrote, I would love for it to be in O(N) and not have to make two loops. I was thinking of saving the index somehow to save space but need help here.
Thanks in advance
Sorry this is not a great question
I’m not entirely sure what’s going on in your code, but I believe it could be replaced with a single forward pass. Assuming your data is sorted.
last_short = filldata['time'][0] - np.timedelta64(9,'ms')
last_long = filldata['time'][0] - np.timedelta64(9,'ms')
for row in fillData.itertuples():
if row.fill == 1:
last_short = row.time + np.timedelta64(250,'ms')
last_long = row.time + np.timedelta64(3, 'm')
#filldata['250 mili'][row.Index] = 1 #you might want this line
else:
if row.time <= last_short:
filldata['250 mili'][row.Index] = 1
elif row.time <= last_long:#ifelif your choice
filldata['6 min'][row.Index] = 1
Or you could probably use the rolling
method of the dataframe.
Does this do what you want:
fillData.set_index('time', drop=True, inplace=True)
condition = fillData.fill.eq(1)
fillData['500 milli'] = (condition.rolling(pd.Timedelta('500ms'))
.agg(any)
.astype(int))
fillData['6 minutes'] = (condition.rolling(pd.Timedelta('6m'))
.agg(any)
.astype(int))
fillData['6 minutes'][fillData['500 milli'].eq(1)] = 0
fillData.reset_index(drop=False, inplace=True)
I’m not sure how fillData
is sorted. My assumption is that the sorting is ascending (in time). Otherwise you have to reverse it.