seaborn clustermap FloatingPointError: NaN dissimilarity value

Question:

I try to run this code:

import pandas as pd
import seaborn as sns

df = pd.DataFrame(clusters, columns=cols)

sns.clustermap(df, cmap="vlag", vmin=0, vmax=1, metric="correlation", 
    z_score=None, standard_scale=None, yticklabels=True, 
    figsize=(size, size))

The value of clusters is:

clusters = [[0.89463602, 0.,         0.,         0.85185185, 0.9023569,  0.,
  0.,         0.83333333, 0.,         0.,         0.,        ],

 [0.75,       0.66666667, 0.,         0.,         0.69444444, 0.,
  0.89272031, 0.,         0.69444444, 0.,         0.69444444,],

 [0.85185185, 0.88910175, 0.,         0.,         0.9043771,  0.,
  0.,         0.,         0.89092141, 0.77777778, 0.69444444,],

 [0.75,       0.89825458, 0.,         0.,         0.77777778, 0.,
  0.8908046,  0.,         0.75,       0.91550069, 0.8,       ],]

and I get the following error:

in linkage
    linkage_wrap(N, X, Z, mthidx[method])
FloatingPointError: NaN dissimilarity value.

any ideas for what causes it?

Asked By: obar

||

Answers:

Two of your columns are all zeros, and have no variation at all, making it return nan with correlation:

cols = ["col"+str(i) for i in range(11)]
df = pd.DataFrame(clusters, columns=cols)
df.corr()

            col0       col1     col2    col3    col4        col5    col6    col7    col8    col9    col10
col0    1.000000    -0.652805   NaN 0.755353    0.914034    NaN -0.971167   0.755353    -0.607892   -0.232318   -0.792705
col1    -0.652805   1.000000    NaN -0.967396   -0.353987   NaN 0.461102    -0.967396   0.982783    0.761192    0.976659
col2    NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
col3    0.755353    -0.967396   NaN 1.000000    0.537949    NaN -0.577350   1.000000    -0.978166   -0.573568   -0.990826
col4    0.914034    -0.353987   NaN 0.537949    1.000000    NaN -0.943651   0.537949    -0.352431   0.181392    -0.546475
col5    NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
col6    -0.971167   0.461102    NaN -0.577350   -0.943651   NaN 1.000000    -0.577350   0.401476    0.079648    0.627048
col7    0.755353    -0.967396   NaN 1.000000    0.537949    NaN -0.577350   1.000000    -0.978166   -0.573568   -0.990826
col8    -0.607892   0.982783    NaN -0.978166   -0.352431   NaN 0.401476    -0.978166   1.000000    0.665620    0.962359
col9    -0.232318   0.761192    NaN -0.573568   0.181392    NaN 0.079648    -0.573568   0.665620    1.000000    0.636492
col10   -0.792705   0.976659    NaN -0.990826   -0.546475   NaN 0.627048    -0.990826   0.962359    0.636492    1.000000

df[['col2','col5']]

   col2 col5
0   0.0 0.0
1   0.0 0.0
2   0.0 0.0
3   0.0 0.0

You can either remove those columns and plot, or you have to use euclidean or canberra as metric.

Answered By: StupidWolf
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.