How can i calculate r square in pandas dataframe elegantly?

Question:

assume i have a dataframe, i want to calculate the r square between two columns.

ps. not the r2(df[0], df[1]), what i want is r2 that use df[0] to ols fit df[1]’s r2.

for example:

In [21]: df = pd.DataFrame(np.random.rand(10, 2))

In [22]: df
Out[22]: 
          0         1
0  0.776080  0.966668
1  0.922351  0.024381
2  0.859104  0.397823
3  0.607491  0.425335
4  0.732265  0.667846
5  0.336950  0.544515
6  0.236403  0.610943
7  0.811736  0.306425
8  0.110440  0.059754
9  0.469844  0.957298

how can i calculate the r2 for column 1 corresponding to column 0?

Asked By: xyhuang

||

Answers:

As already stated in the comments, sklearn has a method to calculate the r squared.

from sklearn.metrics import r2_score

r2_score(df[0], df[1])

# -1.8462387938183031

But to answer your question and to calculate it ourselves in pandas, we can use vectorized methods:

res = df[0].sub(df[1]).pow(2).sum()
tot = df[0].sub(df[0].mean()).pow(2).sum()

r2 = 1 - res/tot

# -1.8462387938183031
Answered By: Erfan
r = df[0].corr(df[1])
r2 = r ** 2
Answered By: Bruno Assis
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.