How do I use matplotlib to create a bar chart of a very large dataset?

Question:

The data I am working with is an array 27,000 elements long which is a histogram of a few million data points but what I have is the histogram and I need to plot it in my program, preferably with vertical bars.

I’ve tried using the ‘bar’ function in matplotlib but this takes a minute or two to plot whereas using just regular plot (with just points on the chart) is almost immediate but obviously does not achieve the effect I want (i.e. bars). I’m not sure why the bar function is so much slower so I was wondering if there was a more effective way to plot a histogram with vertical bars using matplotlib?

I’ve looked at the hist function with matplotlib but it’s purpose to my understanding is to take data, make a histogram, and then plot it but I already have a histogram so I don’t believe it works for my case. I greatly appreciate any help!

Here’s a reference to the hist function documentation, maybe I missed something.
https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.pyplot.hist.html

Thanks in advance! Let me know if you would like an example of the code I am working with but it is just your most generic my_axes.plot(my_data) or my_axes.bar(my_data) so I’m not sure how helpful it would be.

I’ve taken a look at this as well now: https://gist.github.com/pierdom/d639a1d3b8934ee31db8b2ab9997ae92.

This also works but has the same time issue as using bar so I suppose this is just an issue with rendering a lot of vertical bars? (though I still wonder why rendering 27000 points happens so quickly)

Asked By: Alec Petersen

||

Answers:

matplotlib’s bar should be pretty fast to execute so I’m guessing you’re passing all the data points to it (although you mention you have "histogram data", so if you can provide more details on the format, that’d help).

bar takes the x positions for the bars and the heights, so if you want the bar function to produce a histogram you need to bin and count.

This will produce something similar to matplotlib’s hist:

import matplotlib.pyplot as plt

bins = [0, 1, 2, 3]
heights = [1, 2, 3, 4]

ax = plt.gca()
ax.bar(bins, heights, align='center', width=1)
Answered By: Eduardo

Apparently, this is a known and discussed limitation of the bar graph as it is currently implemented. See this issue and this discussion. Though there are questions about it’s usefulness, in my particular case I have a toolbar across the top that allows the user to zoom in and move around the data set (which is very practical method for my use case).

However, a great alternative does exist in the form of stairs. Simply use fill and you have an effective bar graph, that is much more performant.

import matplotlib.pyplot as plt
import random

bins = range(27001)  # Note that bins needs to be one greater then heights
heights = [random.randint(0, i) for i in range(27000)]

ax = plt.gca()
ax.stairs(heights, bins, fill=True)

plt.show()
Answered By: Alec Petersen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.