Python: Finding a trend in a set of numbers

Question:

I have a list of numbers in Python, like this:

x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]

What’s the best way to find the trend in these numbers? I’m not interested in predicting what the next number will be, I just want to output the trend for many sets of numbers so that I can compare the trends.

Edit: By trend, I mean that I’d like a numerical representation of whether the numbers are increasing or decreasing and at what rate. I’m not massively mathematical, so there’s probably a proper name for this!

Edit 2: It looks like what I really want is the co-efficient of the linear best fit. What’s the best way to get this in Python?

Asked By: Sam Starling

||

Answers:

You could do a least squares fit of the data.

Using the formula from this page:

y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
N = len(y)
x = range(N)
B = (sum(x[i] * y[i] for i in xrange(N)) - 1./N*sum(x)*sum(y)) / (sum(x[i]**2 for i in xrange(N)) - 1./N*sum(x)**2)
A = 1.*sum(y)/N - B * 1.*sum(x)/N
print "%f + %f * x" % (A, B)

Which prints the starting value and delta of the best fit line.

Answered By: Keith Randall

Here is one way to get an increasing/decreasing trend:

>>> x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
>>> trend = [b - a for a, b in zip(x[::1], x[1::1])]
>>> trend
[22, -5, 9, -4, 17, -22, 5, 13, -13, 21, 39, -26, 13]

In the resulting list trend, trend[0] can be interpreted as the increase from x[0] to x[1], trend[1] would be the increase from x[1] to x[2] etc. Negative values in trend mean that value in x decreased from one index to the next.

Answered By: Andrew Clark

I agree with Keith, I think you’re probably looking for a linear least squares fit (if all you want to know is if the numbers are generally increasing or decreasing, and at what rate). The slope of the fit will tell you at what rate they’re increasing. If you want a visual representation of a linear least squares fit, try Wolfram Alpha:

http://www.wolframalpha.com/input/?i=linear+fit+%5B12%2C+34%2C+29%2C+38%2C+34%2C+51%2C+29%2C+34%2C+47%2C+34%2C+55%2C+94%2C+68%2C+81%5D

Update: If you want to implement a linear regression in Python, I recommend starting with the explanation at Mathworld:

http://mathworld.wolfram.com/LeastSquaresFitting.html

It’s a very straightforward explanation of the algorithm, and it practically writes itself. In particular, you want to pay close attention to equations 16-21, 27, and 28.

Try writing the algorithm yourself, and if you have problems, you should open another question.

Answered By: Ethan Brown

Possibly you mean you want to plot these numbers on a graph and find a straight line through them where the overall distance between the line and the numbers is minimized? This is called a linear regression

def linreg(X, Y):
    """
    return a,b in solution to y = ax + b such that root mean square distance between trend line and original points is minimized
    """
    N = len(X)
    Sx = Sy = Sxx = Syy = Sxy = 0.0
    for x, y in zip(X, Y):
        Sx = Sx + x
        Sy = Sy + y
        Sxx = Sxx + x*x
        Syy = Syy + y*y
        Sxy = Sxy + x*y
    det = Sxx * N - Sx * Sx
    return (Sxy * N - Sy * Sx)/det, (Sxx * Sy - Sx * Sxy)/det


x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
a,b = linreg(range(len(x)),x)  //your x,y are switched from standard notation

The trend line is unlikely to pass through your original points, but it will be as close as possible to the original points that a straight line can get. Using the gradient and intercept values of this trend line (a,b) you will be able to extrapolate the line past the end of the array:

extrapolatedtrendline=[a*index + b for index in range(20)] //replace 20 with desired trend length
Answered By: Riaz Rizvi

The Link provided by Keith or probably the answer from Riaz might help you to get the poly fit, but it is always recommended to use libraries if available, and for the problem in your hand, numpy provides a wonderful polynomial fit function called polyfit . You can use polyfit to fit the data over any degree of equation.

Here is an example using numpy to fit the data in a linear equation of the form y=ax+b

>>> data = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
>>> x = np.arange(0,len(data))
>>> y=np.array(data)
>>> z = np.polyfit(x,y,1)
>>> print "{0}x + {1}".format(*z)
4.32527472527x + 17.6
>>> 

similarly a quadratic fit would be

>>> print "{0}x^2 + {1}x + {2}".format(*z)
0.311126373626x^2 + 0.280631868132x + 25.6892857143
>>> 
Answered By: Abhijit

Compute the beta coefficient.

y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
x = range(1,len(y)+1)

def var(X):
    S = 0.0
    SS = 0.0
    for x in X:
        S += x
        SS += x*x
    xbar = S/float(len(X))
    return (SS - len(X) * xbar * xbar) / (len(X) -1.0)

def cov(X,Y):
    n = len(X)
    xbar = sum(X) / n
    ybar = sum(Y) / n
    return sum([(x-xbar)*(y-ybar) for x,y in zip(X,Y)])/(n-1)


def beta(x,y):
    return cov(x,y)/var(x)

print beta(x,y) #4.34285714286
Answered By: luke14free

You can find the OLS coefficient using numpy:

import numpy as np

y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]

x = []
x.append(range(len(y)))                 #Time variable
x.append([1 for ele in xrange(len(y))]) #This adds the intercept, use range in Python3

y = np.matrix(y).T
x = np.matrix(x).T

betas = ((x.T*x).I*x.T*y)

Results:

>>> betas
matrix([[  4.32527473],  #coefficient on the time variable
        [ 17.6       ]]) #coefficient on the intercept

Since the coefficient on the trend variable is positive, observations in your variable are increasing over time.

Answered By: Akavall
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.