# How to get p-value and pearson's r for a list of columns in Pandas?

## Question:

I’m trying to make a multiindexed table (a matrix) of correlation coefficients and p-values. I’d prefer to use the `scipy.stats` tests.

``````x = pd.DataFrame(
list(
zip(
[1,2,3,4,5,6], [5, 7, 8, 4, 2, 8], [13, 16, 12, 11, 9, 10]
)
),
columns= ['a', 'b', 'c']
)

# I've tried something like this
for i in range(len(x.columns)):
r,p = pearsonr(x[x.columns[i]], x[x.columns[i+1]])
print(f'{r}, {p}')

``````

Obviously the `for loop` won’t work. What I want to end up with is:

a b c
a r 1.0 -.09 -.8
p .00 .87 .06
b r -.09 1 .42
p .87 .00 .41
c r -.8 .42 1
p .06 .41 00

I had written code to solve this problem (with help from this community) years ago, but it only worked for an older version of `spearmanr`.

Any help would be very much appreciated.

Here is one way to do it using scipy pearsonr and Pandas corr methods:

``````import pandas as pd
from scipy.stats import pearsonr

def pearsonr_pval(x, y):
return pearsonr(x, y)[1]

df = (
pd.concat(
[
x.corr(method="pearson").reset_index().assign(value="r"),
x.corr(method=pearsonr_pval).reset_index().assign(value="p"),
]
)
.groupby(["index", "value"])
.agg(lambda x: list(x)[0])
).sort_index(ascending=[True, False])

df.index.names = ["", ""]
``````

Then:

``````print(df)
# Output
a         b         c

a r  1.000000 -0.088273 -0.796421
p  1.000000  0.867934  0.057948
b r -0.088273  1.000000  0.421184
p  0.867934  1.000000  0.405583
c r -0.796421  0.421184  1.000000
p  0.057948  0.405583  1.000000
``````
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.