How to implement rolling mean ignoring null values
Question:
I am trying calculate RSI indicator. For that I need rolling-mean gain and loss.
I would like to calculate rolling mean ignoring null values. So mean would be calculated by sum and count on existing values.
Example:
window_size = 5
df = DataFrame(price_change: { 1, 2, 3, -2, 4 })
df_gain = .select(
pl.when(pl.col('price_change') > 0.0)
.then(pl.col('price_change'))
.otherwise(None)
.alias('gain')
)
# UNKOWN HOW TO GET WANTED RESULT:
rol_mean_gain = df_gain.select(
pl.col('gain').rolling_mean(window_size=window_size, ignore_null=True)
)
So that rol_mean_gain be calculated: [1, 2, 3, skip, 4] / 4 (not 5)
I know Pandas has .mean(skipna=True)
or .apply(pandas.np.nanmean)
But as far as I am aware polars does not provide such API.
Answers:
I think the skipping of nulls is implied, see the min_periods
description in the docs for this method.
df_gain.select(pl.col('gain').rolling_mean(window_size=window_size, min_periods=1))
Gives me a column of 1.0, 1.5, 2.0, 2.0, 2.5
. Note how the last two columns skips the null correctly.
I am trying calculate RSI indicator. For that I need rolling-mean gain and loss.
I would like to calculate rolling mean ignoring null values. So mean would be calculated by sum and count on existing values.
Example:
window_size = 5
df = DataFrame(price_change: { 1, 2, 3, -2, 4 })
df_gain = .select(
pl.when(pl.col('price_change') > 0.0)
.then(pl.col('price_change'))
.otherwise(None)
.alias('gain')
)
# UNKOWN HOW TO GET WANTED RESULT:
rol_mean_gain = df_gain.select(
pl.col('gain').rolling_mean(window_size=window_size, ignore_null=True)
)
So that rol_mean_gain be calculated: [1, 2, 3, skip, 4] / 4 (not 5)
I know Pandas has .mean(skipna=True)
or .apply(pandas.np.nanmean)
But as far as I am aware polars does not provide such API.
I think the skipping of nulls is implied, see the min_periods
description in the docs for this method.
df_gain.select(pl.col('gain').rolling_mean(window_size=window_size, min_periods=1))
Gives me a column of 1.0, 1.5, 2.0, 2.0, 2.5
. Note how the last two columns skips the null correctly.