Rolling mean and standard deviation without zeros

Question:

I have a data frame that one of its columns represents how many corns produced in this time stamp.
for example

timestamp corns_produced another_column
1           5                  4
2           0                  1
3           0                  3
4           3                  4
 

The dataframe is big.. 100,000+ rows

I want to calculate moving average and std for 1000 time stamps of corn_produced.
Luckily it is pretty easy using rolling :

  • my_df.rolling(1000).mean()
  • my_df.rolling(1000).std().

But the problem is I want to ignore the zeros, meaning if in the last 1000 timestamps there are only 5 instances in which corn was produced, I want to do the mean and std on those 5 elements.

How do I ignore the zeros ?

Just to clarify, I don’t want to do the following x = my_df[my_df['corns_produced'] != 0], and than do rolling on x, because it ignores the time stamps and doesn’t give me the result I need

Asked By: OopsUser

||

Answers:

You can use Rolling.apply:

print (my_df.rolling(1000).apply(lambda x: x[x!= 0].mean()))
print (my_df.rolling(1000).apply(lambda x: x[x!= 0].std()))
Answered By: jezrael

a faster solution, first set all zeros to np.nan, then taking rolling mean. if you dealing with large data, will be much faster

Answered By: Mr. AlmostRight
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.