How to make a histogram from a list of data and plot it with matplotlib

Question:

I’ve got matplotlib installed and try to create a histogram plot from some data:

#!/usr/bin/python

l = []
with open("testdata") as f:
    line = f.next()
    f.next()  # skip headers
    nat = int(line.split()[0])
    print nat

    for line in f:
        if line.strip():
          if line.strip():
            l.append(map(float,line.split()[1:]))

    b = 0
    a = 1

for b in range(53):
    for a in range(b+1, 54):
        import operator
        import matplotlib.pyplot as plt
        import numpy as np

        vector1 = (l[b][0], l[b][1], l[b][2])
        vector2 = (l[a][0], l[a][1], l[a][2])

        x = vector1
        y = vector2
        vector3 = list(np.array(x) - np.array(y))
        dotProduct = reduce( operator.add, map( operator.mul, vector3, vector3))
    
        dp = dotProduct**.5
        print dp
    
        data = dp
        num_bins = 200  # <- number of bins for the histogram
        plt.hist(data, num_bins)
        plt.show()

I’m getting an error from the last part of the code:

/usr/lib64/python2.6/site-packages/matplotlib/backends/backend_gtk.py:621:     DeprecationWarning: Use the new widget gtk.Tooltip
  self.tooltips = gtk.Tooltips()
Traceback (most recent call last):
  File "vector_final", line 42, in <module>
plt.hist(data, num_bins)
  File "/usr/lib64/python2.6/site-packages/matplotlib/pyplot.py", line 2008, in hist
ret = ax.hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, **kwargs)
  File "/usr/lib64/python2.6/site-packages/matplotlib/axes.py", line 7098, in hist
w = [None]*len(x)
TypeError: len() of unsized object

But anyway, do you have any idea how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?

Asked By: Wana_B3_Nerd

||

Answers:

do you have any idea how to make 200 evenly spaced out bins, and have
your program store the data in the appropriate bins?

You can, for example, use NumPy’s arange for a fixed bin size (or Python’s standard range object), and NumPy’s linspace for evenly spaced bins. Here are 2 simple examples from my matplotlib gallery

Fixed bin size

import numpy as np
import random
from matplotlib import pyplot as plt

data = np.random.normal(0, 20, 1000) 

# fixed bin size
bins = np.arange(-100, 100, 5) # fixed bin size

plt.xlim([min(data)-5, max(data)+5])

plt.hist(data, bins=bins, alpha=0.5)
plt.title('Random Gaussian data (fixed bin size)')
plt.xlabel('variable X (bin size = 5)')
plt.ylabel('count')

plt.show()

enter image description here

Fixed number of bins

import numpy as np
import math
from matplotlib import pyplot as plt

data = np.random.normal(0, 20, 1000) 

bins = np.linspace(math.ceil(min(data)), 
                   math.floor(max(data)),
                   20) # fixed number of bins

plt.xlim([min(data)-5, max(data)+5])

plt.hist(data, bins=bins, alpha=0.5)
plt.title('Random Gaussian data (fixed number of bins)')
plt.xlabel('variable X (20 evenly spaced bins)')
plt.ylabel('count')

plt.show()

enter image description here

Answered By: user2489252

Automatic bins

how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?

The accepted answer manually creates 200 bins with np.arange and np.linspace, but matplotlib already does this automatically:

  1. plt.hist itself returns counts and bins

    counts, bins, _ = plt.hist(data, bins=200)
    

Or if you need the bins before plotting:

  1. np.histogram with plt.stairs

    counts, bins = np.histogram(data, bins=200)
    plt.stairs(counts, bins, fill=True)
    

    Note that stair plots require matplotlib 3.4.0+.

  2. pd.cut with plt.hist

    _, bins = pd.cut(data, bins=200, retbins=True)
    plt.hist(data, bins)
    

    histogram output

Answered By: tdy

There’s a couple of ways to do this.

If you can not guarantee your items all to be the same type and numeric, then use the builtin standard library collections:

import collections
hist = dict(collections.Counter(your_list))

Otherwise if your data is guaranteed to be all the same type and numeric, then use the Python module numpy:

import numpy as np
# for one dimensional data
(hist, bin_edges) = np.histogram(your_list)
# for two dimensional data
(hist, xedges, yedges) = np.histogram2d(your_list)
# for N dimensional data
(hist, edges) = np.histogramdd(your_list)

The numpy histogram functionality is really the Cadillac option because np.histogram can do things like try to figure out how many bins you need and it can do weighting and it has all the algorithms it uses documented with lots of great documentation and example code.

Answered By: Trevor Boyd Smith