# Improving Weighted Moving Average Performance

## Question:

I have been playing around with a pandas data frame with 414,000 rows.

Built into pandas is an exponential moving average computed by:

```
series.ewm(span=period).mean()
```

The above executes in < 0.3 seconds. I am however in search of trying to use a weighted moving average (which has a linear linear weighting of each element). I came across the following function:

```
def WMA(self, s, period):
return s.rolling(period).apply(lambda x: (np.arange(period)+1*x).sum()/(np.arange(period)+1).sum(), raw=True)
```

The **above function took 27 seconds** to execute. I noticed the arange function could be cached and produced the following:

```
def WMA(self, s, period):
weights = np.arange(period)+1
weights_sum = weights.sum()
return s.rolling(period).apply(lambda x: (weights*x).sum()/weights_sum, raw=True)
```

The above function took **11 seconds**, which is a noticeable improvement.

What I’m trying to figure out is if there is some way I can further optimize this (ideally replace the apply function) but genuinely am not sure how to go about it.

Any ideas would be appreciated!

## Answers:

You can use the `np`

sliding window function docs, then it looks like this:

```
import numpy as np
import pandas as pd
d1 = pd.DataFrame(np.random.randint(0, 10, size=(500_000))) # x=500_000
p = 50
w = np.arange(p)+1
w_s = w.sum()
########## for comparison purpose ##########
# 1.47 s ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
r = d1.rolling(p).apply(lambda x: (w*x).sum()/w_s, raw=True)
# 62.1 ms ± 4.57 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
swv = np.lib.stride_tricks.sliding_window_view(d1.values.flatten(), window_shape=p)
sw = (swv*w).sum(axis=1) / w_s
########## for comparison purpose ##########
np.array_equal(r.iloc[p - 1:].values.flatten(), sw) # True
```

So, an overall speedup of `~23.67x`

. However, you need to adjust the shape to your desired shape afterwards. Since `sw`

starts at `0`

with a shape of `x-p`

. Whereas `r`

starts at `p`

, with a shape of `x`

and the first `p`

values `-> nan`

.

Skeletor above was right on the money and I adapted it slightly to handle the issues with `nan`

```
# THIS USES LOWER LEVEL NUMPY TO GREATLY SPEED IT UP!
def WMA(self, s, period):
w = np.arange(period)+1
w_s = w.sum()
swv = sliding_window_view(s.values.flatten(), window_shape=period)
sw = (swv * w).sum(axis=1) / w_s
# Need to now return it as a normal series
sw = np.concatenate((np.full(period - 1, np.nan), sw))
return pd.Series(sw)
```

**dropped it from 11 seconds down to 1.5 seconds which is much better!**

Take a look at the parallel-pandas library. With its help, you can parallelize the `apply`

method of a sliding window.

Just two extra lines of code if you count library imports)

```
import pandas as pd
import numpy as np
from time import monotonic
from parallel_pandas import ParallelPandas
def WMA(s, period):
weights = np.arange(period) + 1
weights_sum = weights.sum()
return s.rolling(period).apply(lambda x: (weights * x).sum() / weights_sum, raw=True)
def parallel_wma(s, period):
weights = np.arange(period) + 1
weights_sum = weights.sum()
# p_apply is parallel apply method
return s.rolling(period).p_apply(lambda x: (weights * x).sum() / weights_sum, raw=True)
if __name__ == '__main__':
# initialize parallel-pandas
ParallelPandas.initialize(n_cpu=16, disable_pr_bar=True)
#create series of length 500 000
s = pd.Series(np.random.randint(0, 5, size=500_000))
period = 50
start = monotonic()
res = WMA(s, period)
print(f'synchronous wma time took: {monotonic() - start:.2f} s.')
start = monotonic()
res2 = parallel_wma(s, period)
print(f'parallel wma time took: {monotonic() - start:.2f} s.')
```

```
Output:
synchronous wma time took: 1.16 s.
parallel wma time took: 0.22 s.
```

Total speedup: `1.16/0.22 ~ 5.3`

and close to the performance on `numpy`

arrays that demonstrated Skeletor