Python pandas rolling sum positive number with duplicate timestamps

Question:

have a dataframe with two columns like below.
One column is datetime and another is purely numbers.
I’d like to sum all positive numbers of last 5 minutes.Tried

df['positive'] = df['number'].rolling('5T').sum()

but didn’t work. Somehow, getting a

ValueError: window must be an integer 0 or greater

error.

      datetime     number
2023-09-01 16:30:01 -2.0
2023-09-01 16:30:01 0.0
2023-09-01 16:30:02 1.0
2023-09-01 16:30:04 0.0
2023-09-01 16:30:05 -1.0
... 
2023-09-29 15:09:36 3.0
2023-09-29 15:09:45 0.0
2023-09-29 15:09:45 1.0
2023-09-29 15:09:47 -1.0
2023-09-29 15:09:51 1.0

Can anyone help showing the correct way of coding this in Python?
Thank you very much in advance.

Asked By: jad

||

Answers:

You can replace negative numbers with zero and apply a rolling window and compute the sum

import pandas as pd

# Sample data
df['datetime'] = pd.to_datetime(df['datetime'])

# Set datetime as index
df.set_index('datetime', inplace=True)

# Replace negative numbers with zero
df['positive'] = df['number'].clip(lower=0)

# Compute rolling sum over the last 5 minutes
df['rolling_sum'] = df['positive'].rolling('5T').sum()
Answered By: Yuri R

You don’t necessarily make negative/positive numbers 0.
You can define your own function and then run apply.

For example:

# Set datetime as index
df.set_index('datetime', inplace=True)

def sum_only_positive(x):
    return sum([a for a in x if x > 0])

df['positive'] = df.rolling('5T').apply(sum_only_positive)

if you want to sum only negatives, for example, you would have:

def sum_only_negative(x):
    return sum([a for a in x if x < 0])

df['negative'] = df.rolling('5T').apply(sum_only_negative)
Answered By: atteggiani