Improving Weighted Moving Average Performance
Question:
I have been playing around with a pandas data frame with 414,000 rows.
Built into pandas is an exponential moving average computed by:
series.ewm(span=period).mean()
The above executes in < 0.3 seconds. I am however in search of trying to use a weighted moving average (which has a linear linear weighting of each element). I came across the following function:
def WMA(self, s, period):
return s.rolling(period).apply(lambda x: (np.arange(period)+1*x).sum()/(np.arange(period)+1).sum(), raw=True)
The above function took 27 seconds to execute. I noticed the arange function could be cached and produced the following:
def WMA(self, s, period):
weights = np.arange(period)+1
weights_sum = weights.sum()
return s.rolling(period).apply(lambda x: (weights*x).sum()/weights_sum, raw=True)
The above function took 11 seconds, which is a noticeable improvement.
What I’m trying to figure out is if there is some way I can further optimize this (ideally replace the apply function) but genuinely am not sure how to go about it.
Any ideas would be appreciated!
Answers:
You can use the np
sliding window function docs, then it looks like this:
import numpy as np
import pandas as pd
d1 = pd.DataFrame(np.random.randint(0, 10, size=(500_000))) # x=500_000
p = 50
w = np.arange(p)+1
w_s = w.sum()
########## for comparison purpose ##########
# 1.47 s ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
r = d1.rolling(p).apply(lambda x: (w*x).sum()/w_s, raw=True)
# 62.1 ms ± 4.57 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
swv = np.lib.stride_tricks.sliding_window_view(d1.values.flatten(), window_shape=p)
sw = (swv*w).sum(axis=1) / w_s
########## for comparison purpose ##########
np.array_equal(r.iloc[p - 1:].values.flatten(), sw) # True
So, an overall speedup of ~23.67x
. However, you need to adjust the shape to your desired shape afterwards. Since sw
starts at 0
with a shape of x-p
. Whereas r
starts at p
, with a shape of x
and the first p
values -> nan
.
Skeletor above was right on the money and I adapted it slightly to handle the issues with nan
# THIS USES LOWER LEVEL NUMPY TO GREATLY SPEED IT UP!
def WMA(self, s, period):
w = np.arange(period)+1
w_s = w.sum()
swv = sliding_window_view(s.values.flatten(), window_shape=period)
sw = (swv * w).sum(axis=1) / w_s
# Need to now return it as a normal series
sw = np.concatenate((np.full(period - 1, np.nan), sw))
return pd.Series(sw)
dropped it from 11 seconds down to 1.5 seconds which is much better!
Take a look at the parallel-pandas library. With its help, you can parallelize the apply
method of a sliding window.
Just two extra lines of code if you count library imports)
import pandas as pd
import numpy as np
from time import monotonic
from parallel_pandas import ParallelPandas
def WMA(s, period):
weights = np.arange(period) + 1
weights_sum = weights.sum()
return s.rolling(period).apply(lambda x: (weights * x).sum() / weights_sum, raw=True)
def parallel_wma(s, period):
weights = np.arange(period) + 1
weights_sum = weights.sum()
# p_apply is parallel apply method
return s.rolling(period).p_apply(lambda x: (weights * x).sum() / weights_sum, raw=True)
if __name__ == '__main__':
# initialize parallel-pandas
ParallelPandas.initialize(n_cpu=16, disable_pr_bar=True)
#create series of length 500 000
s = pd.Series(np.random.randint(0, 5, size=500_000))
period = 50
start = monotonic()
res = WMA(s, period)
print(f'synchronous wma time took: {monotonic() - start:.2f} s.')
start = monotonic()
res2 = parallel_wma(s, period)
print(f'parallel wma time took: {monotonic() - start:.2f} s.')
Output:
synchronous wma time took: 1.16 s.
parallel wma time took: 0.22 s.
Total speedup: 1.16/0.22 ~ 5.3
and close to the performance on numpy
arrays that demonstrated Skeletor
I have been playing around with a pandas data frame with 414,000 rows.
Built into pandas is an exponential moving average computed by:
series.ewm(span=period).mean()
The above executes in < 0.3 seconds. I am however in search of trying to use a weighted moving average (which has a linear linear weighting of each element). I came across the following function:
def WMA(self, s, period):
return s.rolling(period).apply(lambda x: (np.arange(period)+1*x).sum()/(np.arange(period)+1).sum(), raw=True)
The above function took 27 seconds to execute. I noticed the arange function could be cached and produced the following:
def WMA(self, s, period):
weights = np.arange(period)+1
weights_sum = weights.sum()
return s.rolling(period).apply(lambda x: (weights*x).sum()/weights_sum, raw=True)
The above function took 11 seconds, which is a noticeable improvement.
What I’m trying to figure out is if there is some way I can further optimize this (ideally replace the apply function) but genuinely am not sure how to go about it.
Any ideas would be appreciated!
You can use the np
sliding window function docs, then it looks like this:
import numpy as np
import pandas as pd
d1 = pd.DataFrame(np.random.randint(0, 10, size=(500_000))) # x=500_000
p = 50
w = np.arange(p)+1
w_s = w.sum()
########## for comparison purpose ##########
# 1.47 s ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
r = d1.rolling(p).apply(lambda x: (w*x).sum()/w_s, raw=True)
# 62.1 ms ± 4.57 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
swv = np.lib.stride_tricks.sliding_window_view(d1.values.flatten(), window_shape=p)
sw = (swv*w).sum(axis=1) / w_s
########## for comparison purpose ##########
np.array_equal(r.iloc[p - 1:].values.flatten(), sw) # True
So, an overall speedup of ~23.67x
. However, you need to adjust the shape to your desired shape afterwards. Since sw
starts at 0
with a shape of x-p
. Whereas r
starts at p
, with a shape of x
and the first p
values -> nan
.
Skeletor above was right on the money and I adapted it slightly to handle the issues with nan
# THIS USES LOWER LEVEL NUMPY TO GREATLY SPEED IT UP!
def WMA(self, s, period):
w = np.arange(period)+1
w_s = w.sum()
swv = sliding_window_view(s.values.flatten(), window_shape=period)
sw = (swv * w).sum(axis=1) / w_s
# Need to now return it as a normal series
sw = np.concatenate((np.full(period - 1, np.nan), sw))
return pd.Series(sw)
dropped it from 11 seconds down to 1.5 seconds which is much better!
Take a look at the parallel-pandas library. With its help, you can parallelize the apply
method of a sliding window.
Just two extra lines of code if you count library imports)
import pandas as pd
import numpy as np
from time import monotonic
from parallel_pandas import ParallelPandas
def WMA(s, period):
weights = np.arange(period) + 1
weights_sum = weights.sum()
return s.rolling(period).apply(lambda x: (weights * x).sum() / weights_sum, raw=True)
def parallel_wma(s, period):
weights = np.arange(period) + 1
weights_sum = weights.sum()
# p_apply is parallel apply method
return s.rolling(period).p_apply(lambda x: (weights * x).sum() / weights_sum, raw=True)
if __name__ == '__main__':
# initialize parallel-pandas
ParallelPandas.initialize(n_cpu=16, disable_pr_bar=True)
#create series of length 500 000
s = pd.Series(np.random.randint(0, 5, size=500_000))
period = 50
start = monotonic()
res = WMA(s, period)
print(f'synchronous wma time took: {monotonic() - start:.2f} s.')
start = monotonic()
res2 = parallel_wma(s, period)
print(f'parallel wma time took: {monotonic() - start:.2f} s.')
Output:
synchronous wma time took: 1.16 s.
parallel wma time took: 0.22 s.
Total speedup: 1.16/0.22 ~ 5.3
and close to the performance on numpy
arrays that demonstrated Skeletor