How to make a histogram from a list of strings
Question:
I have a list of strings:
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
I want to make a histogram for displaying the frequency distribution of the letters. I can make a list that contains the count of each letter using following codes:
from itertools import groupby
b = [len(list(group)) for key, group in groupby(a)]
How do I make the histogram? I may have a million such elements in list a
.
Answers:
Check out matplotlib.pyplot.bar
. There is also numpy.histogram
which is more flexible if you want wider bins.
Rather than use groupby()
(which requires your input to be sorted), use collections.Counter()
; this doesn’t have to create intermediary lists just to count inputs:
from collections import Counter
counts = Counter(a)
You haven’t really specified what you consider to be a ‘histogram’. Lets assume you wanted to do this on the terminal:
width = 120 # Adjust to desired width
longest_key = max(len(key) for key in counts)
graph_width = width - longest_key - 2
widest = counts.most_common(1)[0][1]
scale = graph_width / float(widest)
for key, size in sorted(counts.items()):
print('{}: {}'.format(key, int(size * scale) * '*'))
Demo:
>>> from collections import Counter
>>> a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
>>> counts = Counter(a)
>>> width = 120 # Adjust to desired width
>>> longest_key = max(len(key) for key in counts)
>>> graph_width = width - longest_key - 2
>>> widest = counts.most_common(1)[0][1]
>>> scale = graph_width / float(widest)
>>> for key, size in sorted(counts.items()):
... print('{}: {}'.format(key, int(size * scale) * '*'))
...
a: *********************************************************************************************
b: **********************************************
c: **********************************************************************
d: ***********************
e: *********************************************************************************************************************
More sophisticated tools are found in the numpy.histogram()
and matplotlib.pyplot.hist()
functions. These do the tallying for you, with matplotlib.pyplot.hist()
also providing you with graph output.
Very easy with Pandas
.
import pandas
from collections import Counter
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
df = pandas.DataFrame.from_dict(letter_counts, orient='index')
df.plot(kind='bar')
Notice that Counter
is making a frequency count, so our plot type is 'bar'
not 'hist'
.
As @notconfusing pointed above this can be solved with Pandas and Counter. If for any reason you need to not use Pandas you can get by with only matplotlib
using the function in the following code:
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
def plot_bar_from_counter(counter, ax=None):
""""
This function creates a bar plot from a counter.
:param counter: This is a counter object, a dictionary with the item as the key
and the frequency as the value
:param ax: an axis of matplotlib
:return: the axis wit the object in it
"""
if ax is None:
fig = plt.figure()
ax = fig.add_subplot(111)
frequencies = counter.values()
names = counter.keys()
x_coordinates = np.arange(len(counter))
ax.bar(x_coordinates, frequencies, align='center')
ax.xaxis.set_major_locator(plt.FixedLocator(x_coordinates))
ax.xaxis.set_major_formatter(plt.FixedFormatter(names))
return ax
plot_bar_from_counter(letter_counts)
plt.show()
Simple and effective way to make character histrogram in python
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
a = []
count =0
d = dict()
filename = raw_input("Enter file name: ")
with open(filename,'r') as f:
for word in f:
for letter in word:
if letter not in d:
d[letter] = 1
else:
d[letter] +=1
num = Counter(d)
x = list(num.values())
y = list(num.keys())
x_coordinates = np.arange(len(num.keys()))
plt.bar(x_coordinates,x)
plt.xticks(x_coordinates,y)
plt.show()
print x,y
Using numpy
Using numpy 1.9 or greater:
import numpy as np
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
labels, counts = np.unique(a,return_counts=True)
This can be plotted using:
import matplotlib.pyplot as plt
ticks = range(len(counts))
plt.bar(ticks,counts, align='center')
plt.xticks(ticks, labels)
this was a while ago so i’m not sure if you still need help but other people might so i’m here. if you’re allowed to use matplotlib i think there’s a much simpler solution!
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
import matplotlib.pyplot as plt
plt.hist(a) #gives you a histogram of your array 'a'
plt.show() #finishes out the plot
this should get you a nice histogram! there are also more edits you can do to clean up the graph if you’d like
- Technically, this is a frequency count of discrete categorical values.
- A histogram is a frequency distribution of continuous numeric values.
- This can be accomplished by passing the list to either the
x=
or y=
parameter of seaborn.countplot
, seaborn.histplot
, or sns.displot
with kind='hist'
.
- seaborn is a high-level API for matplotlib.
- See How to add value labels on a bar chart to add value label annotations to the top of the bars.
import seaborn as sns
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
ax = sns.countplot(x=a)
ax = sns.countplot(y=a)
ax = sns.histplot(x=a)
g = sns.displot(kind='hist', x=a)
I have a list of strings:
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
I want to make a histogram for displaying the frequency distribution of the letters. I can make a list that contains the count of each letter using following codes:
from itertools import groupby
b = [len(list(group)) for key, group in groupby(a)]
How do I make the histogram? I may have a million such elements in list a
.
Check out matplotlib.pyplot.bar
. There is also numpy.histogram
which is more flexible if you want wider bins.
Rather than use groupby()
(which requires your input to be sorted), use collections.Counter()
; this doesn’t have to create intermediary lists just to count inputs:
from collections import Counter
counts = Counter(a)
You haven’t really specified what you consider to be a ‘histogram’. Lets assume you wanted to do this on the terminal:
width = 120 # Adjust to desired width
longest_key = max(len(key) for key in counts)
graph_width = width - longest_key - 2
widest = counts.most_common(1)[0][1]
scale = graph_width / float(widest)
for key, size in sorted(counts.items()):
print('{}: {}'.format(key, int(size * scale) * '*'))
Demo:
>>> from collections import Counter
>>> a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
>>> counts = Counter(a)
>>> width = 120 # Adjust to desired width
>>> longest_key = max(len(key) for key in counts)
>>> graph_width = width - longest_key - 2
>>> widest = counts.most_common(1)[0][1]
>>> scale = graph_width / float(widest)
>>> for key, size in sorted(counts.items()):
... print('{}: {}'.format(key, int(size * scale) * '*'))
...
a: *********************************************************************************************
b: **********************************************
c: **********************************************************************
d: ***********************
e: *********************************************************************************************************************
More sophisticated tools are found in the numpy.histogram()
and matplotlib.pyplot.hist()
functions. These do the tallying for you, with matplotlib.pyplot.hist()
also providing you with graph output.
Very easy with Pandas
.
import pandas
from collections import Counter
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
df = pandas.DataFrame.from_dict(letter_counts, orient='index')
df.plot(kind='bar')
Notice that Counter
is making a frequency count, so our plot type is 'bar'
not 'hist'
.
As @notconfusing pointed above this can be solved with Pandas and Counter. If for any reason you need to not use Pandas you can get by with only matplotlib
using the function in the following code:
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
def plot_bar_from_counter(counter, ax=None):
""""
This function creates a bar plot from a counter.
:param counter: This is a counter object, a dictionary with the item as the key
and the frequency as the value
:param ax: an axis of matplotlib
:return: the axis wit the object in it
"""
if ax is None:
fig = plt.figure()
ax = fig.add_subplot(111)
frequencies = counter.values()
names = counter.keys()
x_coordinates = np.arange(len(counter))
ax.bar(x_coordinates, frequencies, align='center')
ax.xaxis.set_major_locator(plt.FixedLocator(x_coordinates))
ax.xaxis.set_major_formatter(plt.FixedFormatter(names))
return ax
plot_bar_from_counter(letter_counts)
plt.show()
Simple and effective way to make character histrogram in python
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
a = []
count =0
d = dict()
filename = raw_input("Enter file name: ")
with open(filename,'r') as f:
for word in f:
for letter in word:
if letter not in d:
d[letter] = 1
else:
d[letter] +=1
num = Counter(d)
x = list(num.values())
y = list(num.keys())
x_coordinates = np.arange(len(num.keys()))
plt.bar(x_coordinates,x)
plt.xticks(x_coordinates,y)
plt.show()
print x,y
Using numpy
Using numpy 1.9 or greater:
import numpy as np
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
labels, counts = np.unique(a,return_counts=True)
This can be plotted using:
import matplotlib.pyplot as plt
ticks = range(len(counts))
plt.bar(ticks,counts, align='center')
plt.xticks(ticks, labels)
this was a while ago so i’m not sure if you still need help but other people might so i’m here. if you’re allowed to use matplotlib i think there’s a much simpler solution!
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
import matplotlib.pyplot as plt
plt.hist(a) #gives you a histogram of your array 'a'
plt.show() #finishes out the plot
this should get you a nice histogram! there are also more edits you can do to clean up the graph if you’d like
- Technically, this is a frequency count of discrete categorical values.
- A histogram is a frequency distribution of continuous numeric values.
- This can be accomplished by passing the list to either the
x=
ory=
parameter ofseaborn.countplot
,seaborn.histplot
, orsns.displot
withkind='hist'
.- seaborn is a high-level API for matplotlib.
- See How to add value labels on a bar chart to add value label annotations to the top of the bars.
import seaborn as sns
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
ax = sns.countplot(x=a)
ax = sns.countplot(y=a)
ax = sns.histplot(x=a)
g = sns.displot(kind='hist', x=a)