Rolling median in python

Question:

I have some stock data based on daily close values. I need to be able to insert these values into a python list and get a median for the last 30 closes. Is there a python library that does this?

Asked By: yueerhu

||

Answers:

In pure Python, having your data in a Python list a, you could do

median = sum(sorted(a[-30:])[14:16]) / 2.0

(This assumes a has at least 30 items.)

Using the NumPy package, you could use

median = numpy.median(a[-30:])
Answered By: Sven Marnach

isn’t the median just the middle value in a sorted range?

so, assuming your list is stock_data:

last_thirty = stock_data[-30:]
median = sorted(last_thirty)[15]

Now you just need to get the off-by-one errors found and fixed and also handle the case of stock_data being less than 30 elements…

let us try that here a bit:

def rolling_median(data, window):
    if len(data) < window:
       subject = data[:]
    else:
       subject = data[-30:]
    return sorted(subject)[len(subject)/2]
Answered By: Daren Thomas

Have you considered pandas? It is based on numpy and can automatically associate timestamps with your data, and discards any unknown dates as long as you fill it with numpy.nan. It also offers some rather powerful graphing via matplotlib.

Basically it was designed for financial analysis in python.

Answered By: Mike Pennington

#found this helpful:

import numpy as np
list=[10,20,30,40,50]

med=[]
j=0
for x in list:
    sub_set=list[0:j+1]
    median = np.median(sub_set)
    med.append(median)    
    j+=1
print(med)
Answered By: emeka ochiabuto

Here is a much faster method with w*|x| space complexity.

def moving_median(x, w):
    shifted = np.zeros((len(x)+w-1, w))
    shifted[:,:] = np.nan
    for idx in range(w-1):
        shifted[idx:-w+idx+1, idx] = x
    shifted[idx+1:, idx+1] = x
    # print(shifted)
    medians = np.median(shifted, axis=1)
    for idx in range(w-1):
        medians[idx] = np.median(shifted[idx, :idx+1])
        medians[-idx-1] = np.median(shifted[-idx-1, -idx-1:])
    return medians[(w-1)//2:-(w-1)//2]

moving_median(np.arange(10), 4)
# Output
array([0.5, 1. , 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8. ])

The output has the same length as the input vector.
Rows with less than one entry will be ignored and with half of them nans (happens only for an even window-width), only the first option will be returned. Here is the shifted_matrix from above with the respective median values:

[[ 0. nan nan nan] -> -
 [ 1.  0. nan nan] -> 0.5
 [ 2.  1.  0. nan] -> 1.0
 [ 3.  2.  1.  0.] -> 1.5
 [ 4.  3.  2.  1.] -> 2.5
 [ 5.  4.  3.  2.] -> 3.5
 [ 6.  5.  4.  3.] -> 4.5
 [ 7.  6.  5.  4.] -> 5.5
 [ 8.  7.  6.  5.] -> 6.5
 [ 9.  8.  7.  6.] -> 7.5
 [nan  9.  8.  7.] -> 8.0
 [nan nan  9.  8.] -> -
 [nan nan nan  9.]]-> -

The behaviour can be changed by adapting the final slice medians[(w-1)//2:-(w-1)//2].

Benchmark:

%%timeit
moving_median(np.arange(1000), 4)
# 267 µs ± 759 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Alternative approach: (the results will be shifted)

def moving_median_list(x, w):
    medians = np.zeros(len(x))
    for j in range(len(x)):
        medians[j] = np.median(x[j:j+w])
    return medians

%%timeit
moving_median_list(np.arange(1000), 4)
# 15.7 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Both algorithms have a linear time complexity.
Therefore, the function moving_median will be the faster option.

Answered By: Christoph Schranz