Python assign covariance row wise calculation

Question:

I am trying to assign the covariance value to a column based on the dataframe I have. The df is ~400k records x 30+ columns. The two data series that act as inputs for COV() are all aligned as a single record (with ~400k records). I would like to assign the column names as a list and then do operations as arrays. I can do this with the associated mean, but the covariance seems elusive.

Additionally, as a workaround, i can create the covariance in a more clunky manual way by writing out all of the steps, but it is not dynamic.
Example of dataframe (first 5 records, 4 acct monthly return and benchmark return figures – in the actual df, there are 12 months of acct returns and 12 months of benchmark returns). I have tried various iterations of COV(), however, as both datasets (acct returns/benchmark returns) are all on the same record; i have not found a good way of creating the function.

df = pd.DataFrame({'ACCT_ID':['A_12345','A_23456','A_34567','A_45678','A_56789'],
                  'Acct_m1_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065],
                  'Acct_m2_RoR':[0.025, 0.035, 0.055, 0.0127, 0.065],
                  'Acct_m3_RoR':[0.065, -0.075, -0.015, 0.0527, 0.015],
                  'Acct_m4_RoR':[-0.009, 0.015, -0.065, 0.0827, -0.025],
                  'BCHMK_m1_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065],
                  'BCHMK_m2_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065],
                  'BCHMK_m3_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065],
                  'BCHMK_m4_RoR':[-0.025, -0.035, -0.055, 0.0127, -0.065]})

 List of column headers:


  a1=['Acct_m1_RoR','Acct_m2_RoR','Acct_m3_RoR','Acct_m4_RoR','Acct_m5_RoR','Acct_m6_RoR','Acct_m7_RoR','Acct_m8_RoR','Acct_m9_RoR','Acct_m10_RoR','Acct_m11_RoR','Acct_m12_RoR']
  b1=['BCHMK_m1_RoR','BCHMK_m2_RoR','BCHMK_m3_RoR','BCHMK_m4_RoR','BCHMK_m5_RoR','BCHMK_m6_RoR','BCHMK_m7_RoR','BCHMK_m8_RoR','BCHMK_m9_RoR','BCHMK_m10_RoR','BCHMK_m11_RoR','BCHMK_m12_RoR']

df['acct_mean'] = np.mean(df[a1],axis = 1)
df['bchmk_mean'] = np.mean(df[b1], axis = 1)

semi manual workaround:

df['cov'] = (((df['Acct_m1_RoR'] - df['acct_mean']) * (df['BCHMK_m1_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m2_RoR'] - df['acct_mean']) * (df['BCHMK_m2_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m3_RoR'] - df['acct_mean']) * (df['BCHMK_m3_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m4_RoR'] - df['acct_mean']) * (df['BCHMK_m4_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m5_RoR'] - df['acct_mean']) * (df['BCHMK_m5_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m6_RoR'] - df['acct_mean']) * (df['BCHMK_m6_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m7_RoR'] - df['acct_mean']) * (df['BCHMK_m7_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m8_RoR'] - df['acct_mean']) * (df['BCHMK_m8_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m9_RoR'] - df['acct_mean']) * (df['BCHMK_m9_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m10_RoR'] - df['acct_mean']) * (df['BCHMK_m10_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m11_RoR'] - df['acct_mean']) * (df['BCHMK_m11_RoR'] - df['bchmk_mean'])) 
+ ((df['Acct_m11_RoR'] - df['acct_mean']) * (df['BCHMK_m12_RoR'] - df['bchmk_mean']))) / 12
Asked By: John

||

Answers:

A shorter version of your workaround is this:

df['cov'] = ((df[a1].to_numpy() - df[['acct_mean']].to_numpy()) * (df[b1].to_numpy() - df[['bchmk_mean']].to_numpy())).sum(axis=1) / 12
Answered By: constantstranger