How to make a histogram from a list of data and plot it with matplotlib
Question:
I’ve got matplotlib installed and try to create a histogram plot from some data:
#!/usr/bin/python
l = []
with open("testdata") as f:
line = f.next()
f.next() # skip headers
nat = int(line.split()[0])
print nat
for line in f:
if line.strip():
if line.strip():
l.append(map(float,line.split()[1:]))
b = 0
a = 1
for b in range(53):
for a in range(b+1, 54):
import operator
import matplotlib.pyplot as plt
import numpy as np
vector1 = (l[b][0], l[b][1], l[b][2])
vector2 = (l[a][0], l[a][1], l[a][2])
x = vector1
y = vector2
vector3 = list(np.array(x) - np.array(y))
dotProduct = reduce( operator.add, map( operator.mul, vector3, vector3))
dp = dotProduct**.5
print dp
data = dp
num_bins = 200 # <- number of bins for the histogram
plt.hist(data, num_bins)
plt.show()
I’m getting an error from the last part of the code:
/usr/lib64/python2.6/site-packages/matplotlib/backends/backend_gtk.py:621: DeprecationWarning: Use the new widget gtk.Tooltip
self.tooltips = gtk.Tooltips()
Traceback (most recent call last):
File "vector_final", line 42, in <module>
plt.hist(data, num_bins)
File "/usr/lib64/python2.6/site-packages/matplotlib/pyplot.py", line 2008, in hist
ret = ax.hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, **kwargs)
File "/usr/lib64/python2.6/site-packages/matplotlib/axes.py", line 7098, in hist
w = [None]*len(x)
TypeError: len() of unsized object
But anyway, do you have any idea how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?
Answers:
do you have any idea how to make 200 evenly spaced out bins, and have
your program store the data in the appropriate bins?
You can, for example, use NumPy’s arange
for a fixed bin size (or Python’s standard range object), and NumPy’s linspace
for evenly spaced bins. Here are 2 simple examples from my matplotlib gallery
Fixed bin size
import numpy as np
import random
from matplotlib import pyplot as plt
data = np.random.normal(0, 20, 1000)
# fixed bin size
bins = np.arange(-100, 100, 5) # fixed bin size
plt.xlim([min(data)-5, max(data)+5])
plt.hist(data, bins=bins, alpha=0.5)
plt.title('Random Gaussian data (fixed bin size)')
plt.xlabel('variable X (bin size = 5)')
plt.ylabel('count')
plt.show()
Fixed number of bins
import numpy as np
import math
from matplotlib import pyplot as plt
data = np.random.normal(0, 20, 1000)
bins = np.linspace(math.ceil(min(data)),
math.floor(max(data)),
20) # fixed number of bins
plt.xlim([min(data)-5, max(data)+5])
plt.hist(data, bins=bins, alpha=0.5)
plt.title('Random Gaussian data (fixed number of bins)')
plt.xlabel('variable X (20 evenly spaced bins)')
plt.ylabel('count')
plt.show()
Automatic bins
how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?
The accepted answer manually creates 200 bins with np.arange
and np.linspace
, but matplotlib already does this automatically:
-
plt.hist
itself returns counts and bins
counts, bins, _ = plt.hist(data, bins=200)
Or if you need the bins before plotting:
-
np.histogram
with plt.stairs
counts, bins = np.histogram(data, bins=200)
plt.stairs(counts, bins, fill=True)
Note that stair plots require matplotlib 3.4.0+.
-
pd.cut
with plt.hist
_, bins = pd.cut(data, bins=200, retbins=True)
plt.hist(data, bins)
There’s a couple of ways to do this.
If you can not guarantee your items all to be the same type and numeric, then use the builtin standard library collections
:
import collections
hist = dict(collections.Counter(your_list))
Otherwise if your data is guaranteed to be all the same type and numeric, then use the Python module numpy
:
import numpy as np
# for one dimensional data
(hist, bin_edges) = np.histogram(your_list)
# for two dimensional data
(hist, xedges, yedges) = np.histogram2d(your_list)
# for N dimensional data
(hist, edges) = np.histogramdd(your_list)
The numpy histogram functionality is really the Cadillac option because np.histogram
can do things like try to figure out how many bins you need and it can do weighting and it has all the algorithms it uses documented with lots of great documentation and example code.
I’ve got matplotlib installed and try to create a histogram plot from some data:
#!/usr/bin/python
l = []
with open("testdata") as f:
line = f.next()
f.next() # skip headers
nat = int(line.split()[0])
print nat
for line in f:
if line.strip():
if line.strip():
l.append(map(float,line.split()[1:]))
b = 0
a = 1
for b in range(53):
for a in range(b+1, 54):
import operator
import matplotlib.pyplot as plt
import numpy as np
vector1 = (l[b][0], l[b][1], l[b][2])
vector2 = (l[a][0], l[a][1], l[a][2])
x = vector1
y = vector2
vector3 = list(np.array(x) - np.array(y))
dotProduct = reduce( operator.add, map( operator.mul, vector3, vector3))
dp = dotProduct**.5
print dp
data = dp
num_bins = 200 # <- number of bins for the histogram
plt.hist(data, num_bins)
plt.show()
I’m getting an error from the last part of the code:
/usr/lib64/python2.6/site-packages/matplotlib/backends/backend_gtk.py:621: DeprecationWarning: Use the new widget gtk.Tooltip
self.tooltips = gtk.Tooltips()
Traceback (most recent call last):
File "vector_final", line 42, in <module>
plt.hist(data, num_bins)
File "/usr/lib64/python2.6/site-packages/matplotlib/pyplot.py", line 2008, in hist
ret = ax.hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, **kwargs)
File "/usr/lib64/python2.6/site-packages/matplotlib/axes.py", line 7098, in hist
w = [None]*len(x)
TypeError: len() of unsized object
But anyway, do you have any idea how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?
do you have any idea how to make 200 evenly spaced out bins, and have
your program store the data in the appropriate bins?
You can, for example, use NumPy’s arange
for a fixed bin size (or Python’s standard range object), and NumPy’s linspace
for evenly spaced bins. Here are 2 simple examples from my matplotlib gallery
Fixed bin size
import numpy as np
import random
from matplotlib import pyplot as plt
data = np.random.normal(0, 20, 1000)
# fixed bin size
bins = np.arange(-100, 100, 5) # fixed bin size
plt.xlim([min(data)-5, max(data)+5])
plt.hist(data, bins=bins, alpha=0.5)
plt.title('Random Gaussian data (fixed bin size)')
plt.xlabel('variable X (bin size = 5)')
plt.ylabel('count')
plt.show()
Fixed number of bins
import numpy as np
import math
from matplotlib import pyplot as plt
data = np.random.normal(0, 20, 1000)
bins = np.linspace(math.ceil(min(data)),
math.floor(max(data)),
20) # fixed number of bins
plt.xlim([min(data)-5, max(data)+5])
plt.hist(data, bins=bins, alpha=0.5)
plt.title('Random Gaussian data (fixed number of bins)')
plt.xlabel('variable X (20 evenly spaced bins)')
plt.ylabel('count')
plt.show()
Automatic bins
how to make 200 evenly spaced out bins, and have your program store the data in the appropriate bins?
The accepted answer manually creates 200 bins with np.arange
and np.linspace
, but matplotlib already does this automatically:
-
plt.hist
itself returns counts and binscounts, bins, _ = plt.hist(data, bins=200)
Or if you need the bins before plotting:
-
np.histogram
withplt.stairs
counts, bins = np.histogram(data, bins=200) plt.stairs(counts, bins, fill=True)
Note that stair plots require matplotlib 3.4.0+.
-
pd.cut
withplt.hist
_, bins = pd.cut(data, bins=200, retbins=True) plt.hist(data, bins)
There’s a couple of ways to do this.
If you can not guarantee your items all to be the same type and numeric, then use the builtin standard library collections
:
import collections
hist = dict(collections.Counter(your_list))
Otherwise if your data is guaranteed to be all the same type and numeric, then use the Python module numpy
:
import numpy as np
# for one dimensional data
(hist, bin_edges) = np.histogram(your_list)
# for two dimensional data
(hist, xedges, yedges) = np.histogram2d(your_list)
# for N dimensional data
(hist, edges) = np.histogramdd(your_list)
The numpy histogram functionality is really the Cadillac option because np.histogram
can do things like try to figure out how many bins you need and it can do weighting and it has all the algorithms it uses documented with lots of great documentation and example code.