Linear Regression with np.arrays and scipy.stats

Question:

I’m trying to do a linear regression on two 2×3 arrays, one of x values and one of y values where each row is a separate data set, but when I try to compute it on the whole array:


import numpy as np
from scipy.stats import linregress


sigma = [[10. 20. 40.]
 [15. 30. 50.]]
tau = [[ 7.  14.  28. ]
 [15.5 31.1 51.8]]

slope = linregress(sigma, tau)[0]

I get the error:

ValueError: too many values to unpack (expected 4)

However, if I select the rows manually it works fine:

slope = linregress(sigma[0,:], tau[0,:])[0]

I know I could probably get what I want with a for loop, but I feel like there should be a more straightforward way that I’m just missing. Also, I know I could do this really simply without arrays, but I’d like to get this right so I can use it for much larger data sets in the future. Thanks!

Asked By: Netforce23

||

Answers:

I believe that the straightforward solution is to actually have a for loop. You are using the linear regression module from SciPy and by reading the documentation I think you cannot have the vectorized solution that you are looking for.

If you don’t like the for loop solution, maybe because it would be too slow for the real datasets that you will have to cope with in the future, you can try a more compact solution using a list comprehension which are usually faster than simple for loops (but depends on the problem at hand and I don’t have any detail about your real application).

So, you could try something like that:

import numpy as np
from scipy.stats import linregress

sigmas = [[10, 20, 40], [15, 30, 50]]
taus = [[7,  14,  28], [15.5, 31.1, 51.8]]

slopes = [linregress(sigma, tau)[0] for sigma, tau in zip(sigmas, taus)]
slopes
>>> [0.7, 1.037027027027027]

I think this is an elegant solution. It’s a short one liner (less than 88 characters) and it’s clear and it could also be potentially faster than an explicit for loop.

Or, as @MadPhysicist suggested, you could implement your own linear regression function using, for instance, NumPy, that can benefit from vectorization.

Answered By: blunova