How to resolve MemoryError with multi-colored heatmap

Question:

I am trying to plot a heat map with multiple colors by reading data from a file. I can very well generate a 2D and normal heat map but not able to plot the one like attached image. When used random numbers I can plot this but while reading data from the file it is showing me error.

enter image description here

The above heat maps is generated with random data

Input: col[1] and col[2] are the x and y co ordinates

00022d9064bc 819251 440006 1073260801 1073260803 2.0 
00022dba8f51 819251 440006 1073260801 1073260803 2.0 
00022de1c6c1 819251 440006 1073260801 1073260803 2.0 
003065f30f37 819251 440006 1073260801 1073260803 2.0 
00904b48a3b6 819251 440006 1073260801 1073260803 2.0 
00904b83a0ea 819213 439954 1073260803 1073260810 7.0 
00904b85d3cf 817526 439458 1073260803 1073261920 1117.0 
00904b14b494 817558 439525 1073260804 1073265410 4606.0 
00904b99499c 817558 439525 1073260804 1073262625 1821.0 
00904bb96e83 817558 439525 1073260804 1073265163 4359.0 
00904bf91b75 817558 439525 1073260804 1073263786 2982.0 
00022d36a6df 820428 438735 1073260807 1073260809 2.0 

Code:

from matplotlib import pyplot as plt 
from matplotlib import cm as CM
from matplotlib import mlab as ml
import numpy as np 

data = np.loadtxt('inputfile', unpack=True, dtype='str, int, int, int, int, float')

x  = data[1]
y  = data[2]

X, Y = np.meshgrid(x,y)

x = X.ravel()
y = Y.ravel()

gridsize = 30 
plt.subplot(111)

cb = plt.colorbar()
cb.set_label('density')
plt.show() 

Error:

Traceback (most recent call last):
  File "heat3.py", line 11, in <module>
    X, Y = np.meshgrid(x,y)
  File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 3106, in meshgrid
    mult_fact = np.ones(shape, dtype=int)
  File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 178, in ones
    a = empty(shape, dtype, order)
MemoryError

Any useful suggestion appreciated

Asked By: user3964336

||

Answers:

The error you are seeing is coming from meshgrid trying to build probably quite massive matrices. If your data contains N lines, the matrices will be N×N. Depending how many points you have and how densely they are packed, you will be wanting one of two things in the heatmap. Either

  1. you want to interpolate between points which are far apart to form a smooth surface, or
  2. you want to aggregate densely packed points by counting how many fall in a particular region (2D histogram).

I’ve modified your code below to work for the second case (hexbin does this automatically) as you don’t appear to be referencing a third value in your data to interpolate on.

from matplotlib import pyplot as plt 
from matplotlib import cm as CM
from matplotlib import mlab as ml
import numpy as np 

data = np.loadtxt('inputfile', unpack=True, dtype='str, int, int, int, int, float')

x = data[1]
y = data[2]
z = data[5]

# These lines are completely unnecessary and perhaps come
# from a different solution which was interpolating between points
#X, Y = np.meshgrid(x,y)
#x = X.ravel()
#y = Y.ravel()

gridsize = 30 
#plt.subplot(111)  # <- You don't need this as it is one plot anyway
plt.hexbin(x, y, C=z)   # <- You need to do the hexbin plot
cb = plt.colorbar()
cb.set_label('density')
plt.show() 

The vestigial meshgrid call which I’ve commented out above is perhaps from a piece of code you found somewhere which does the first option (interpolating between spaced-out points), perhaps my using griddata. If this is in fact what you want, you can have a look at this cookbook entry on gridding irregularly spaced data

Answered By: chthonicdaemon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.