Python pandas returns empty correlation matrix

Question:

I am running Python 2.7.6, pandas 0.13.1. I am unable to compute a correlation matrix from a DataFrame, and I’m not sure why. Here is my example DataFrame (foo):

                       A             B            C
2011-10-12   0.006204908 -0.0009503677  0.003480105
2011-10-13    0.00234903 -0.0005122284 -0.001738786
2011-10-14    0.01045599   0.000346268  0.002378351
2011-10-17   0.003239088   0.001246239 -0.002651856
2011-10-18   0.001717674 -0.0001738079  0.002013923
2011-10-19  0.0001919342  6.399505e-05 -0.001311259
2011-10-20  0.0007430615   0.001186141  0.001919222
2011-10-21   -0.01075129    -0.0015123  0.000807017
2011-10-24   -0.00819597 -0.0005124197  0.003037654
2011-10-25   -0.01604287   0.001157013 -0.001227516

Now I’ll try to compute the correlation:

In [27]: foo.corr()
Out[27]:
Empty DataFrame
Columns: []
Index: []
[0 rows x 0 columns]

On the other hand, I can compute correlations of each column to each other column. For example:

foo['A'].corr(foo['B'])
# 0.048578514633405255

Any idea what might be causing this issue?

Asked By: Max

||

Answers:

As Jeff mentioned in the comments, the problem resulted from my columns having the object dtype. For future reference, even if the object looks numeric, check the dtype and make sure it is numeric (e.g. do foo.astype(float)) before computing the correlation matrix.

Answered By: Max

Since pandas 1.5.0, corr() has a numeric_only= parameter. If the values in the dataframe can be safely converted into floats, i.e. if df.astype(float) doesn’t raise errors, then setting numeric_only=False makes corr() work for object/string/Decimal data.

df.corr(numeric_only=False)

Example:

from decimal import Decimal
df = pd.DataFrame({
    'A': ['0.006204908', '0.00234903', '0.01045599', '0.001717674'],
    'B': [Decimal('-0.07'), Decimal('-0.04'), Decimal('0.08'), Decimal('-0.07')],
    'C': ['0.003480105', '-0.001738786', '0.002378351', '-0.002651856']})


# pandas < 1.5.0
df.corr()

Empty DataFrame
Columns: []
Index: []


# pandas >= 1.5.0
df.corr(numeric_only=False)

          A         B         C
A  1.000000  0.816457  0.827324
B  0.816457  1.000000  0.369191
C  0.827324  0.369191  1.000000
Answered By: cottontail
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.