How to implement rolling mean ignoring null values

Question:

I am trying calculate RSI indicator. For that I need rolling-mean gain and loss.

I would like to calculate rolling mean ignoring null values. So mean would be calculated by sum and count on existing values.

Example:

window_size = 5

df = DataFrame(price_change: { 1, 2, 3, -2, 4 })

df_gain = .select(
            pl.when(pl.col('price_change') > 0.0)
            .then(pl.col('price_change'))
            .otherwise(None)
            .alias('gain')
          )

# UNKOWN HOW TO GET WANTED RESULT:
rol_mean_gain = df_gain.select(
                  pl.col('gain').rolling_mean(window_size=window_size, ignore_null=True)
                )

So that rol_mean_gain be calculated: [1, 2, 3, skip, 4] / 4 (not 5)

I know Pandas has .mean(skipna=True) or .apply(pandas.np.nanmean)
But as far as I am aware polars does not provide such API.

Asked By: Joris MedeiĊĦis

||

Answers:

I think the skipping of nulls is implied, see the min_periods description in the docs for this method.

df_gain.select(pl.col('gain').rolling_mean(window_size=window_size, min_periods=1))

Gives me a column of 1.0, 1.5, 2.0, 2.0, 2.5. Note how the last two columns skips the null correctly.

Answered By: Wayoshi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.