Linear Regression with part of pandas dataframe for 300 columns
Question:
I have a pandas dataframe with the heat production of 300 devices mapped on the outside tempearture which looks like this:
I now want to do a linear regression (y= ß0+ ß1*x1) on all 300 heatig_devices for the temperature range 2 to 3.5. So that x is the outside temperature and y is the heating_device output
And at the end I would like to have for every heating device a regression cefficient ß1.
Whats the best way to do so ?
Answers:
You should provide some workable code but from the numpy documentation:
x = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z = np.polyfit(x, y, 1) #1 is the fitting order i.e. degree of polynomial
z
For the linear case you would get two parameters slope
and intercept
.
If you read the docs then y
can be a 2D numpy array. In your case y
would be the heating values and x
would be the temperature.
Simply compute the coefficient for every column using LinearRegression
from sklearn.linear_model
.
for i in range(300):
t = LinearRegression().fit(df[['outside temperature']], df[['heating_device'+str(i+1)]])
print(i + 1, t.coef_[0], t.intercept_[0])
Now it will print the coefficient for every column
I have a pandas dataframe with the heat production of 300 devices mapped on the outside tempearture which looks like this:
I now want to do a linear regression (y= ß0+ ß1*x1) on all 300 heatig_devices for the temperature range 2 to 3.5. So that x is the outside temperature and y is the heating_device output
And at the end I would like to have for every heating device a regression cefficient ß1.
Whats the best way to do so ?
You should provide some workable code but from the numpy documentation:
x = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z = np.polyfit(x, y, 1) #1 is the fitting order i.e. degree of polynomial
z
For the linear case you would get two parameters slope
and intercept
.
If you read the docs then y
can be a 2D numpy array. In your case y
would be the heating values and x
would be the temperature.
Simply compute the coefficient for every column using LinearRegression
from sklearn.linear_model
.
for i in range(300):
t = LinearRegression().fit(df[['outside temperature']], df[['heating_device'+str(i+1)]])
print(i + 1, t.coef_[0], t.intercept_[0])
Now it will print the coefficient for every column