Calculating Carry over effect
Question:
I want to calculate the carry-over effect of TV advertising GRP data.
My input data looks like below:
Variable Date Causal Half_Life
0 TV Model 2016-01-10 0 4
1 TV Model 2016-01-17 0 4
2 TV Model 2016-01-24 0 4
3 TV Model 2016-01-31 100 4
4 TV Model 2016-02-07 110 4
5 TV Model 2016-02-14 89 4
6 TV Model 2016-02-21 57 4
7 TV Model 2016-02-28 90 4
8 TV General 2016-01-10 0 4
9 TV General 2016-01-17 0 4
10 TV General 2016-01-24 0 4
11 TV General 2016-01-31 30 4
12 TV General 2016-02-07 32 4
13 TV General 2016-02-14 42 4
14 TV General 2016-02-21 39 4
15 TV General 2016-02-28 55 4
I want to calculate a new column df[‘Adstock’] based on the below condition:
If first row of the group from Column df.Variable, then df.Adstock = df.Causal
If not the first row from the group then, df. Adstock = df.Causal + 0.5**(1/df.Half_life)*df.Adstock from the previous row.
I am using the below code:
import pandas as pd
import numpy as np
import numpy.random as random
import statsmodels.api as sm
import statsmodels.tsa as tsa
import statsmodels.formula.api as smf
import datetime
df = pd.read_excel('RC Data.xlsx')
df['Adstock'] = 0
df['Adstock'] = np.where(df['Variable'] == df['Variable'].shift(1), df['Adstock'].shift(1)*(0.5**(1/df['Half_Life'])) + df['Causal'], df['Causal'])
The output I get is as below:
Variable Date Causal Half_Life Adstock
0 TV Model 2016-01-10 0 4 0.0
1 TV Model 2016-01-17 0 4 0.0
2 TV Model 2016-01-24 0 4 0.0
3 TV Model 2016-01-31 100 4 100.0
4 TV Model 2016-02-07 110 4 110.0
5 TV Model 2016-02-14 89 4 89.0
6 TV Model 2016-02-21 57 4 57.0
7 TV Model 2016-02-28 90 4 90.0
8 TV General 2016-01-10 0 4 0.0
9 TV General 2016-01-17 0 4 0.0
10 TV General 2016-01-24 0 4 0.0
11 TV General 2016-01-31 30 4 30.0
12 TV General 2016-02-07 32 4 32.0
13 TV General 2016-02-14 42 4 42.0
14 TV General 2016-02-21 39 4 39.0
15 TV General 2016-02-28 55 4 55.0
But the required output should look like this:
Variable Date Causal Half_Life Adstock
0 TV Model 2016-01-10 0 4 0.000000
1 TV Model 2016-01-17 0 4 0.000000
2 TV Model 2016-01-24 0 4 0.000000
3 TV Model 2016-01-31 100 4 100.000000
4 TV Model 2016-02-07 110 4 194.089642
5 TV Model 2016-02-14 89 4 252.209284
6 TV Model 2016-02-21 57 4 269.081883
7 TV Model 2016-02-28 90 4 316.269991
8 TV General 2016-01-10 0 4 0.000000
9 TV General 2016-01-17 0 4 0.000000
10 TV General 2016-01-24 0 4 0.000000
11 TV General 2016-01-31 30 4 30.000000
12 TV General 2016-02-07 32 4 57.226892
13 TV General 2016-02-14 42 4 90.121889
14 TV General 2016-02-21 39 4 114.783173
15 TV General 2016-02-28 55 4 151.520759
Please help.
Answers:
Here is my solution , I think there is hard to make it vectorized
l=[]
for x , y in df.groupby('Variable',sort=False):
#print(y)
l1=[]
for s,t in y.iterrows():
if len(l1)==0:
l1.append(t['Causal'])
else:
l1.append(t['Causal'] + 0.5**(1/t['Half_Life'])*l1[-1])
l.extend(l1)
df['New']=l
df
Out[982]:
Variable Date Causal Half_Life New
0 TVModel 2016-01-10 0 4 0.000000
1 TVModel 2016-01-17 0 4 0.000000
2 TVModel 2016-01-24 0 4 0.000000
3 TVModel 2016-01-31 100 4 100.000000
4 TVModel 2016-02-07 110 4 194.089642
5 TVModel 2016-02-14 89 4 252.209284
6 TVModel 2016-02-21 57 4 269.081883
7 TVModel 2016-02-28 90 4 316.269991
8 TVGeneral 2016-01-10 0 4 0.000000
9 TVGeneral 2016-01-17 0 4 0.000000
10 TVGeneral 2016-01-24 0 4 0.000000
11 TVGeneral 2016-01-31 30 4 30.000000
12 TVGeneral 2016-02-07 32 4 57.226892
13 TVGeneral 2016-02-14 42 4 90.121889
14 TVGeneral 2016-02-21 39 4 114.783173
15 TVGeneral 2016-02-28 55 4 151.520759
def decay(df, row_id):
causal_value=df._get_value(row_id,'Causal')
half_life = df._get_value(row_id, "Half_Life")
ad_stock_value = df._get_value(row_id - 1, "adstock_value")
val = causal_value+0.5 ** (1 / half_life) * ad_stock_value
return val
def adstock(df):
#adding new col "adstock_value"
df.loc[:, 'adstock_value'] = np.nan
visited = set()
for i in range(0, len(df)):
var = df._get_value(i, "Variable")
if var in visited:
df.loc[i, "adstock_value"] = decay(df, i)
else:
visited.add(var)
df.loc[i, "adstock_value"] = df._get_value(i, "Causal")
#print(df.iloc[i])
adstock(df)
Out[982]:
Variable Date Causal Half_Life New
0 TVModel 2016-01-10 0 4 0.000000
1 TVModel 2016-01-17 0 4 0.000000
2 TVModel 2016-01-24 0 4 0.000000
3 TVModel 2016-01-31 100 4 100.000000
4 TVModel 2016-02-07 110 4 194.089642
5 TVModel 2016-02-14 89 4 252.209284
6 TVModel 2016-02-21 57 4 269.081883
7 TVModel 2016-02-28 90 4 316.269991
8 TVGeneral 2016-01-10 0 4 0.000000
9 TVGeneral 2016-01-17 0 4 0.000000
10 TVGeneral 2016-01-24 0 4 0.000000
11 TVGeneral 2016-01-31 30 4 30.000000
12 TVGeneral 2016-02-07 32 4 57.226892
13 TVGeneral 2016-02-14 42 4 90.121889
14 TVGeneral 2016-02-21 39 4 114.783173
15 TVGeneral 2016-02-28 55 4 151.520759
I want to calculate the carry-over effect of TV advertising GRP data.
My input data looks like below:
Variable Date Causal Half_Life
0 TV Model 2016-01-10 0 4
1 TV Model 2016-01-17 0 4
2 TV Model 2016-01-24 0 4
3 TV Model 2016-01-31 100 4
4 TV Model 2016-02-07 110 4
5 TV Model 2016-02-14 89 4
6 TV Model 2016-02-21 57 4
7 TV Model 2016-02-28 90 4
8 TV General 2016-01-10 0 4
9 TV General 2016-01-17 0 4
10 TV General 2016-01-24 0 4
11 TV General 2016-01-31 30 4
12 TV General 2016-02-07 32 4
13 TV General 2016-02-14 42 4
14 TV General 2016-02-21 39 4
15 TV General 2016-02-28 55 4
I want to calculate a new column df[‘Adstock’] based on the below condition:
If first row of the group from Column df.Variable, then df.Adstock = df.Causal
If not the first row from the group then, df. Adstock = df.Causal + 0.5**(1/df.Half_life)*df.Adstock from the previous row.
I am using the below code:
import pandas as pd
import numpy as np
import numpy.random as random
import statsmodels.api as sm
import statsmodels.tsa as tsa
import statsmodels.formula.api as smf
import datetime
df = pd.read_excel('RC Data.xlsx')
df['Adstock'] = 0
df['Adstock'] = np.where(df['Variable'] == df['Variable'].shift(1), df['Adstock'].shift(1)*(0.5**(1/df['Half_Life'])) + df['Causal'], df['Causal'])
The output I get is as below:
Variable Date Causal Half_Life Adstock
0 TV Model 2016-01-10 0 4 0.0
1 TV Model 2016-01-17 0 4 0.0
2 TV Model 2016-01-24 0 4 0.0
3 TV Model 2016-01-31 100 4 100.0
4 TV Model 2016-02-07 110 4 110.0
5 TV Model 2016-02-14 89 4 89.0
6 TV Model 2016-02-21 57 4 57.0
7 TV Model 2016-02-28 90 4 90.0
8 TV General 2016-01-10 0 4 0.0
9 TV General 2016-01-17 0 4 0.0
10 TV General 2016-01-24 0 4 0.0
11 TV General 2016-01-31 30 4 30.0
12 TV General 2016-02-07 32 4 32.0
13 TV General 2016-02-14 42 4 42.0
14 TV General 2016-02-21 39 4 39.0
15 TV General 2016-02-28 55 4 55.0
But the required output should look like this:
Variable Date Causal Half_Life Adstock
0 TV Model 2016-01-10 0 4 0.000000
1 TV Model 2016-01-17 0 4 0.000000
2 TV Model 2016-01-24 0 4 0.000000
3 TV Model 2016-01-31 100 4 100.000000
4 TV Model 2016-02-07 110 4 194.089642
5 TV Model 2016-02-14 89 4 252.209284
6 TV Model 2016-02-21 57 4 269.081883
7 TV Model 2016-02-28 90 4 316.269991
8 TV General 2016-01-10 0 4 0.000000
9 TV General 2016-01-17 0 4 0.000000
10 TV General 2016-01-24 0 4 0.000000
11 TV General 2016-01-31 30 4 30.000000
12 TV General 2016-02-07 32 4 57.226892
13 TV General 2016-02-14 42 4 90.121889
14 TV General 2016-02-21 39 4 114.783173
15 TV General 2016-02-28 55 4 151.520759
Please help.
Here is my solution , I think there is hard to make it vectorized
l=[]
for x , y in df.groupby('Variable',sort=False):
#print(y)
l1=[]
for s,t in y.iterrows():
if len(l1)==0:
l1.append(t['Causal'])
else:
l1.append(t['Causal'] + 0.5**(1/t['Half_Life'])*l1[-1])
l.extend(l1)
df['New']=l
df
Out[982]:
Variable Date Causal Half_Life New
0 TVModel 2016-01-10 0 4 0.000000
1 TVModel 2016-01-17 0 4 0.000000
2 TVModel 2016-01-24 0 4 0.000000
3 TVModel 2016-01-31 100 4 100.000000
4 TVModel 2016-02-07 110 4 194.089642
5 TVModel 2016-02-14 89 4 252.209284
6 TVModel 2016-02-21 57 4 269.081883
7 TVModel 2016-02-28 90 4 316.269991
8 TVGeneral 2016-01-10 0 4 0.000000
9 TVGeneral 2016-01-17 0 4 0.000000
10 TVGeneral 2016-01-24 0 4 0.000000
11 TVGeneral 2016-01-31 30 4 30.000000
12 TVGeneral 2016-02-07 32 4 57.226892
13 TVGeneral 2016-02-14 42 4 90.121889
14 TVGeneral 2016-02-21 39 4 114.783173
15 TVGeneral 2016-02-28 55 4 151.520759
def decay(df, row_id):
causal_value=df._get_value(row_id,'Causal')
half_life = df._get_value(row_id, "Half_Life")
ad_stock_value = df._get_value(row_id - 1, "adstock_value")
val = causal_value+0.5 ** (1 / half_life) * ad_stock_value
return val
def adstock(df):
#adding new col "adstock_value"
df.loc[:, 'adstock_value'] = np.nan
visited = set()
for i in range(0, len(df)):
var = df._get_value(i, "Variable")
if var in visited:
df.loc[i, "adstock_value"] = decay(df, i)
else:
visited.add(var)
df.loc[i, "adstock_value"] = df._get_value(i, "Causal")
#print(df.iloc[i])
adstock(df)
Out[982]:
Variable Date Causal Half_Life New
0 TVModel 2016-01-10 0 4 0.000000
1 TVModel 2016-01-17 0 4 0.000000
2 TVModel 2016-01-24 0 4 0.000000
3 TVModel 2016-01-31 100 4 100.000000
4 TVModel 2016-02-07 110 4 194.089642
5 TVModel 2016-02-14 89 4 252.209284
6 TVModel 2016-02-21 57 4 269.081883
7 TVModel 2016-02-28 90 4 316.269991
8 TVGeneral 2016-01-10 0 4 0.000000
9 TVGeneral 2016-01-17 0 4 0.000000
10 TVGeneral 2016-01-24 0 4 0.000000
11 TVGeneral 2016-01-31 30 4 30.000000
12 TVGeneral 2016-02-07 32 4 57.226892
13 TVGeneral 2016-02-14 42 4 90.121889
14 TVGeneral 2016-02-21 39 4 114.783173
15 TVGeneral 2016-02-28 55 4 151.520759