save by group predicted values from linear regression to a dataframe

Question:

I want to apply linear regression and predict values to subsets of my original data by V1, V2, V3, V4, V5, and V6. Then I want to store dataframe with names: V1, V2, V3, V4, V5, V6, time, Predicted value. How to achieve it effificiently? What I have now gives me an object that is hard to further work with.

def model(df):     
    X = df['time'].to_numpy().reshape((-1, 1))
    Y = df['speed'].to_numpy() 
    X_new = np.arange(1, 60, 1).reshape((-1, 1))
    return np.squeeze(LinearRegression().fit(X, Y).predict(X_new))

def group_predictions(df): 
    return df.groupby(['V1', 'V2', 'V3', 'V4', 'V5','V6']).apply(model)
Asked By: kittycat

||

Answers:

The output must be a Series of numpy arrays, so explode() should do the trick.

However, time cannot be a column in the output because the dimensions won’t match. Function model() returns the predicted values, so unless the length of each sub-df is 59, time cannot be one of the output columns.

def group_predictions(df):
    return df.groupby(['V1', 'V2', 'V3', 'V4', 'V5','V6']).apply(model).explode().reset_index(name='Predicted value')

If X_new also must be returned, it’s more readable to construct dfs in model() itself. Then group_predictions() must also be modified to accommodate the fact that model() returns a df, not array.

def model(df):     
    X = df['time'].to_numpy().reshape((-1, 1))
    Y = df['speed'].to_numpy() 
    X_new = np.arange(1, 60, 1).reshape((-1, 1))
    return pd.DataFrame({'X_new': X_new.ravel(), 'Predicted value': LinearRegression().fit(X, Y).predict(X_new)})

def group_predictions(df):
    return df.groupby(['V1', 'V2', 'V3', 'V4', 'V5','V6']).apply(model).droplevel(-1).reset_index()
Answered By: not a robot