Python pandas rolling sum positive number with duplicate timestamps
Question:
have a dataframe with two columns like below.
One column is datetime and another is purely numbers.
I’d like to sum all positive numbers of last 5 minutes.Tried
df['positive'] = df['number'].rolling('5T').sum()
but didn’t work. Somehow, getting a
ValueError: window must be an integer 0 or greater
error.
datetime number
2023-09-01 16:30:01 -2.0
2023-09-01 16:30:01 0.0
2023-09-01 16:30:02 1.0
2023-09-01 16:30:04 0.0
2023-09-01 16:30:05 -1.0
...
2023-09-29 15:09:36 3.0
2023-09-29 15:09:45 0.0
2023-09-29 15:09:45 1.0
2023-09-29 15:09:47 -1.0
2023-09-29 15:09:51 1.0
Can anyone help showing the correct way of coding this in Python?
Thank you very much in advance.
Answers:
You can replace negative numbers with zero and apply a rolling window and compute the sum
import pandas as pd
# Sample data
df['datetime'] = pd.to_datetime(df['datetime'])
# Set datetime as index
df.set_index('datetime', inplace=True)
# Replace negative numbers with zero
df['positive'] = df['number'].clip(lower=0)
# Compute rolling sum over the last 5 minutes
df['rolling_sum'] = df['positive'].rolling('5T').sum()
You don’t necessarily make negative/positive numbers 0.
You can define your own function and then run apply
.
For example:
# Set datetime as index
df.set_index('datetime', inplace=True)
def sum_only_positive(x):
return sum([a for a in x if x > 0])
df['positive'] = df.rolling('5T').apply(sum_only_positive)
if you want to sum only negatives, for example, you would have:
def sum_only_negative(x):
return sum([a for a in x if x < 0])
df['negative'] = df.rolling('5T').apply(sum_only_negative)
have a dataframe with two columns like below.
One column is datetime and another is purely numbers.
I’d like to sum all positive numbers of last 5 minutes.Tried
df['positive'] = df['number'].rolling('5T').sum()
but didn’t work. Somehow, getting a
ValueError: window must be an integer 0 or greater
error.
datetime number
2023-09-01 16:30:01 -2.0
2023-09-01 16:30:01 0.0
2023-09-01 16:30:02 1.0
2023-09-01 16:30:04 0.0
2023-09-01 16:30:05 -1.0
...
2023-09-29 15:09:36 3.0
2023-09-29 15:09:45 0.0
2023-09-29 15:09:45 1.0
2023-09-29 15:09:47 -1.0
2023-09-29 15:09:51 1.0
Can anyone help showing the correct way of coding this in Python?
Thank you very much in advance.
You can replace negative numbers with zero and apply a rolling window and compute the sum
import pandas as pd
# Sample data
df['datetime'] = pd.to_datetime(df['datetime'])
# Set datetime as index
df.set_index('datetime', inplace=True)
# Replace negative numbers with zero
df['positive'] = df['number'].clip(lower=0)
# Compute rolling sum over the last 5 minutes
df['rolling_sum'] = df['positive'].rolling('5T').sum()
You don’t necessarily make negative/positive numbers 0.
You can define your own function and then run apply
.
For example:
# Set datetime as index
df.set_index('datetime', inplace=True)
def sum_only_positive(x):
return sum([a for a in x if x > 0])
df['positive'] = df.rolling('5T').apply(sum_only_positive)
if you want to sum only negatives, for example, you would have:
def sum_only_negative(x):
return sum([a for a in x if x < 0])
df['negative'] = df.rolling('5T').apply(sum_only_negative)