Radial heatmap from similarity matrix in Python

Question:

Summary

I have a 2880×2880 similarity matrix (8.5 mil points). My attempt with Holoviews resulted in a 500 MB HTML file which never finishes "opening". So how do I make a round heatmap of the matrix?

Details

I had data from 10 different places, measured over 1 whole year. The hours of each month were turned into arrays, so each month had 24 arrays (one for all 00:00, one for all 01:00 … 22:00, 23:00).

These were about 28-31 cells long, and each cell had the measurement of the thing I’m trying to analyze. So there are these 24 arrays for each month of 1 whole year, i.e. 24×12 = 288 arrays per place. And there are measurements from 10 places. So a total of 2880 arrays were created and all compared to each other, and saved in a 2880×2880 matrix with similarity coefficients.

I’m trying to turn it into a radial similarity matrix like the one from holoviews, but without the ticks and tags (since the format Place01Jan0800 would be cumbersome to look at for 2880 rows), just the shape and colors and divisions:
enter image description here

I managed to create the HTML file itself, but it ended up being 500 MB big, so it never shows up when I open it up. It’s just blank. I’ve added a minimal example below of what I have, and replaced the loading of the datafile with some randomly generated data.

import sys
sys.setrecursionlimit(10000)

import random
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.plotting import show
import gc

# Function creating dummy data for this example
def transformer():
    dimension = 2880
    dummy_matrix = ([[ random.random() for i in range(dimension)  ] for j in range(dimension)]) #Fake, similar data

    col_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
    row_vals = [str(i) for i in range(dimension*dimension)] # Placeholder
    val_vals = (np.reshape(np.array(dummy_matrix), -1)).tolist() # Turn matrix into an array
    idx_vals = [i for i in range(dimension*dimension)] # Placeholder

    return idx_vals, val_vals, row_vals, col_vals

idx_arr, val_arr, row_arr, col_arr = transformer()
df = pd.DataFrame({"values": val_arr, "x-label": row_arr, "y-label": col_arr}, index=idx_arr)

hv.extension('bokeh')
heatmap = hv.HeatMap(df, ["x-label", "y-label"])
heatmap.opts(opts.HeatMap(cmap="viridis", radial=True))

gc.collect() # Attempt to save memory, because this thing is huge
show(hv.render(heatmap))

I had a look at datashader to see if it would help, but I have no idea how to plug it in (if it’s possible for this case) to this radial heatmap, since it seems like the radial heatmap doesn’t have that datashade-feature.

So I have no idea how to tackle this. I would be content with a broad overview too, I don’t need the details nor the hover-infobox nor ability to zoom or any fancy extra features, I just need the general overview for a presentation. I’m open to any solution really.

Asked By: user326964

||

Answers:

Plain Matplotlib seems to be able to handle it, based on answers from here: How do I create radial heatmap in matplotlib?

import random
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np

fig = plt.figure()
ax = Axes3D(fig)

n = 2880
m = 2880
rad = np.linspace(0, 10, m)
a = np.linspace(0, 2 * np.pi, n)
r, th = np.meshgrid(rad, a)

dummy_matrix = ([[ random.random() for i in range(n)  ] for j in range(m)])

plt.subplot(projection="polar")

plt.pcolormesh(th, r, dummy_matrix, cmap = 'Blues')

plt.plot(a, r, ls='none', color = 'k')
plt.grid()
plt.colorbar()
plt.savefig("custom_radial_heatmap.png")
plt.show()

And it didn’t even take an eternity, took only about 20 seconds max.
You would think it would turn out monstrous like that

enter image description here

But the sheer amount of points drowns out the jaggedness, WOOHOO!

enter image description here

There’s some things left to be desired, like tags and ticks, but I think I’ll figure that out.

Answered By: user326964

I recommend you to use heatmp instead of radial heatamp for showing the similarity matrix. The reasons are:

  1. The radial heatmap is designed for periodic variable. The time varible(288 hours) can be considered to be periodic data, however, I think the 288*10(288 hours, 10 places) is no longer periodic because of the existence of the "place".
  2. Near the center of the radial heatmap, the color points will be too dense to be understood by the human.

The following is a simple code to show a heatmap.

import matplotlib.cm
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import numpy as np

n = 2880
m = 2880
dummy_matrix = np.random.rand(m, n)

fig = plt.figure(figsize=(50,50))  # change the figsize to control the resolution
ax = fig.add_subplot(111)
cmap = matplotlib.cm.get_cmap("Blues")  # you may use other build-in colormap or define you own colormap
# if your data is not in range[0,1], use a normalization. Here is normalized by min and max values.
norm = Normalize(vmin=np.amin(dummy_matrix), vmax=np.amax(dummy_matrix))
image = ax.imshow(dummy_matrix, cmap=cmap, norm=norm)
plt.colorbar(image)

plt.show()

Which gives:
This result

Another idea that comes to me is that, perhaps the computation of similarity matrix is unnecessary, and you can plot the orginial 288 * 10 data using radial heat map or just a normal heatmap, and one can get to know the data similarity from the color distribution directly.

Answered By: hellohawaii
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.