Find Future Value for Unique Column Entries in Dataframe using Linear Regression

Question:

In the "Example Dataframe" below there are three unique labels in the "Label" column: "A", "B" and "C".

I would like to predict the future value of "Value1" for "A", "B" and "C" when "Value2" is 65000000 using linear regression.

Example Dataframe

import pandas as pd
data = {'Label': ['A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C','C','C'],
        'Value1': ['1672964520','1672966620','1672967460','1672969380','1672971840',
                   '1672972200','1672963800','1672966140', '1672967760','1672969020',
                   '1672970520', '1672971360','1672963200','1672964700','1672966260',
                   '1672967820', '1672969980', '1672971180'],
        'Value2': ['54727520', '54729380', '54740070', '54744720', '54775410', '54779130',
                   '59598560','59603190','59605060','59611320','59628900','59630950',
                   '58047810','58049680','58051550','58058460','58068740','58088280']}
df=pd.DataFrame(data)
print (df)

I can predict the future value of "Value1" when "A" is the only label in the dataframe (see below).

But, I am getting hung up applying this methodology with the example dataframe. Is there an easy way to modify this code to predict "Value1" for any labels found in the example dataframe?

Desired output Value1: Predicted value of A = "X", B = "Y", C = "Z"… etc

data = {'Label': ['A','A','A','A','A','A',],
        'Value1': ['1672964520','1672966620','1672967460','1672969380','1672971840', '1672972200'],
        'Value2': ['54727520', '54729380', '54740070', '54744720', '54775410', '54779130']}

# Create dataframe using data
df = pd.DataFrame(data)
# Change Value1 and Value2 from obj to float64
df["Value1"] = df.Value1.astype("int64")
df["Value2"] = df.Value2.astype("int")
# Calc means for x and y respectively
xmean = np.mean(df["Value1"])
ymean = np.mean(df["Value2"])
# Calc numerator and denominator of beta
df["xyCov"] = (df["Value1"] - xmean) * (df["Value2"] - ymean)
df["xVar"] = (df["Value1"] - xmean) ** 2
# Calc beta and alpha
beta = df["xyCov"].sum() / df["xVar"].sum()
alpha = ymean - (beta * xmean)
# Calc due date timestamp
Predicted_Value1 = (65000000 - alpha) / beta
# Convert timestamp to datetime
print("Future A value", Predicted_Value1)
Asked By: user17236057

||

Answers:

Here is one way to do it with your example dataframe with Pandas groupby and Python f-strings:

for label, df_ in df.groupby("Label"):
    # Change Value1 and Value2 from obj to float64
    df_["Value1"] = df_.Value1.astype("int64")
    df_["Value2"] = df_.Value2.astype("int")

    # Calc means for x and y respectively
    xmean = np.mean(df_["Value1"])
    ymean = np.mean(df_["Value2"])

    # Calc numerator and denominator of beta
    df_["xyCov"] = (df_["Value1"] - xmean) * (df_["Value2"] - ymean)
    df_["xVar"] = (df_["Value2"] - xmean) ** 2

    # Calc beta and alpha
    beta = df_["xyCov"].sum() / df_["xVar"].sum()
    alpha = ymean - (beta * xmean)

    # Calc anode due date timestamp
    Predicted_Value1 = (65000000 - alpha) / beta

    # Convert timestamp to datetime
    print(f"Future {label} value", Predicted_Value1)

Which outputs:

Future A value 4.922122808656915e+17
Future B value 4.68780950852079e+17
Future C value 4.970684516509964e+17
Answered By: Laurent