correlation matrix with group-by and sort

Question:

I am trying calculate correlation matrix with groupby and sort. I have 100 companies from 11 industries. I would like to group by industry and sort by their total assets (atq), and then calculate the correlation of data.pr_multi with this order. however, when I do sort and groupby, it reverses back and calculates by alphabetical order.

The code I use:

index datafqtr tic pr_multi atq industry
0 2018Q1 A NaN 8698.0 4
1 2018Q2 A -0.0856845728151735 8784.0 4
2 2018Q3 A 0.0035103320774146 8349.0 4
3 2018Q4 A -0.0157732687260246 8541.0 4
4 2018Q1 AAL NaN 53280.0 5
5 2018Q2 AAL -0.2694380292532717 52622.0 5

the code I use:

data1=data18.sort_values(['atq'],ascending=False).groupby('industry').head()
df = data1.pivot_table('pr_multi', ['datafqtr'], 'tic')
# calculate correlation matrix using inbuilt pandas function
correlation_matrix = df.corr()
correlation_matrix.head()
Asked By: hilo

||

Answers:

import pandas as pd
import numpy as np

df = pd.read_csv('data.csv')

df.groupby('name')[['col1','col2']].corr() # you can put as many desired columns here

Out put:

        y   x
name            
a   y   1.000000    0.974467
a   x   0.974467    1.000000
b   y   1.000000    0.975120
b   x   0.975120    1.000000

The data is like this:

   name  col1  col2
0     a  13.7  7.8
1     a -14.7 -9.7
2     a  -3.4 -0.6
3     a   7.4  3.3
4     a  -5.3 -1.9
5     a  -8.3 -2.3
6     a   8.9  3.7
7     a  10.0  7.9
8     a   1.8 -0.4
9     a   6.7  3.1
10    a  17.4  9.9
11    a   8.9  7.7
12    a  -3.1 -1.5
13    a -12.2 -7.9
14    a   7.6  4.9
15    a   4.2  2.3
16    a -15.3 -5.6
17    a   9.9  6.7
18    a  11.0  5.2
19    a   5.7  5.1
20    a  -0.3 -0.6
21    a -15.0 -8.7
22    a -10.6 -5.7
23    a -16.0 -9.1
24    b  16.7  8.5
25    b   9.2  8.2
26    b   4.7  3.4
27    b -16.7 -8.7
28    b  -4.8 -1.5
29    b  -2.6 -2.2
30    b  16.3  9.5
31    b  15.8  9.8
32    b -10.8 -7.3
33    b  -5.4 -3.4
34    b  -6.0 -1.8
35    b   1.9 -0.6
36    b   6.3  6.1
37    b -14.7 -8.0
38    b -16.1 -9.7
39    b -10.5 -8.0
40    b   4.9  1.0
41    b  11.1  4.5
42    b -14.8 -8.5
43    b  -0.2 -2.8
44    b   6.3  1.7
45    b -14.1 -8.7
46    b  13.8  8.9
47    b  -6.2 -3.0
​
Answered By: Quoc

IIUC, you want to calculate the correlation between the order based on the groupby and the pr_multi column. use:

data1=data18.groupby('industry')['atq'].apply(lambda x: x.sort_values(ascending=False))
np.corrcoef(data1.reset_index()['level_1'], data18['pr_multi'].astype(float).fillna(0))

Output:

array([[ 1.        , -0.44754795],
       [-0.44754795,  1.        ]])
Answered By: keramat