Estimate future values following sklearn linear regression of accumulate data over time

Question:

I have 10 days worth of data for the number of burpees completed, and based on this information I want to extrapolate to estimate the total number of burpees that will be completed after 20 days.

data={'Day':[1,2,3,4,5,6,7,8,9,10],'burpees':[12,20,28,32,52,59,71,85,94,112]}
df=pd.DataFrame(data)

I have run sklearn LinearRegression on the data and extracted the coefficient:

from sklearn.linear_model import LinearRegression
reg = LinearRegression()
mdl = reg.fit(df[['Day']], df[['burpees']])
mdl.coef_

How do I get an estimation of the number of burpees on day 20?

Asked By: MeganCole

||

Answers:

As per the documentation, input for .fit() method should be a Numpy array with (n_samples, n_features) shape.

Below should work:

# your data
data = {
    "Day": [1,2,3,4,5,6,7,8,9,10], 
    "burpees": [12, 20, 28, 32, 52, 59, 71, 85, 94, 112],
}
df = pd.DataFrame(data)

from sklearn.linear_model import LinearRegression
reg = LinearRegression()

X = df["Day"].values.reshape(-1, 1)
y = df["burpees"].values.reshape(-1, 1)

mdl = reg.fit(X, y)
print("Intercept, coef:", mdl.intercept_, mdl.coef_)

prediction_data = np.array(20).reshape(-1,1)

print("Prediction:", mdl.predict(prediction_data)[0][0])

Output:

Intercept, coef: [-4.4] [[11.07272727]]
Prediction: 217.0545454545455
Answered By: the_pr0blem