How can i calculate r square in pandas dataframe elegantly?
Question:
assume i have a dataframe, i want to calculate the r square between two columns.
ps. not the r2(df[0], df[1]), what i want is r2 that use df[0] to ols fit df[1]’s r2.
for example:
In [21]: df = pd.DataFrame(np.random.rand(10, 2))
In [22]: df
Out[22]:
0 1
0 0.776080 0.966668
1 0.922351 0.024381
2 0.859104 0.397823
3 0.607491 0.425335
4 0.732265 0.667846
5 0.336950 0.544515
6 0.236403 0.610943
7 0.811736 0.306425
8 0.110440 0.059754
9 0.469844 0.957298
how can i calculate the r2 for column 1 corresponding to column 0?
Answers:
As already stated in the comments, sklearn has a method to calculate the r squared.
from sklearn.metrics import r2_score
r2_score(df[0], df[1])
# -1.8462387938183031
But to answer your question and to calculate it ourselves in pandas, we can use vectorized methods:
res = df[0].sub(df[1]).pow(2).sum()
tot = df[0].sub(df[0].mean()).pow(2).sum()
r2 = 1 - res/tot
# -1.8462387938183031
r = df[0].corr(df[1])
r2 = r ** 2
assume i have a dataframe, i want to calculate the r square between two columns.
ps. not the r2(df[0], df[1]), what i want is r2 that use df[0] to ols fit df[1]’s r2.
for example:
In [21]: df = pd.DataFrame(np.random.rand(10, 2))
In [22]: df
Out[22]:
0 1
0 0.776080 0.966668
1 0.922351 0.024381
2 0.859104 0.397823
3 0.607491 0.425335
4 0.732265 0.667846
5 0.336950 0.544515
6 0.236403 0.610943
7 0.811736 0.306425
8 0.110440 0.059754
9 0.469844 0.957298
how can i calculate the r2 for column 1 corresponding to column 0?
As already stated in the comments, sklearn has a method to calculate the r squared.
from sklearn.metrics import r2_score
r2_score(df[0], df[1])
# -1.8462387938183031
But to answer your question and to calculate it ourselves in pandas, we can use vectorized methods:
res = df[0].sub(df[1]).pow(2).sum()
tot = df[0].sub(df[0].mean()).pow(2).sum()
r2 = 1 - res/tot
# -1.8462387938183031
r = df[0].corr(df[1])
r2 = r ** 2