Create stacked histogram from unequal length arrays
Question:
I’d like to create a stacked histogram. If I have a single 2-D array, made of three equal length data sets, this is simple. Code and image below:
import numpy as np
from matplotlib import pyplot as plt
# create 3 data sets with 1,000 samples
mu, sigma = 200, 25
x = mu + sigma*np.random.randn(1000,3)
#Stack the data
plt.figure()
n, bins, patches = plt.hist(x, 30, stacked=True, density = True)
plt.show()
However, if I try similar code with three data sets of a different length the results are that one histogram covers up another. Is there any way I can do the stacked histogram with mixed length data sets?
##Continued from above
###Now as three separate arrays
x1 = mu + sigma*np.random.randn(990,1)
x2 = mu + sigma*np.random.randn(980,1)
x3 = mu + sigma*np.random.randn(1000,1)
#Stack the data
plt.figure()
plt.hist(x1, bins, stacked=True, density = True)
plt.hist(x2, bins, stacked=True, density = True)
plt.hist(x3, bins, stacked=True, density = True)
plt.show()
Answers:
Well, this is simple. I just need to put the three arrays in a list.
##Continued from above
###Now as three separate arrays
x1 = mu + sigma*np.random.randn(990,1)
x2 = mu + sigma*np.random.randn(980,1)
x3 = mu + sigma*np.random.randn(1000,1)
#Stack the data
plt.figure()
plt.hist([x1,x2,x3], bins, stacked=True, density=True)
plt.show()
- If
pandas
is an option, the arrays can be loaded into a dataframe and plotted.
- The benefit of using pandas, is the data is now in a useful format for additional analysis and other plots.
- The following code will create a
list
of DataFrames
with pandas.DataFrame
, for each array, and then concat
the arrays together in a list-comprehension.
- This is a correct way to create a dataframe of arrays that are not equal in length.
- SO: Creating dataframe from a dictionary where entries have different lengths has more ways to create dataframes from arrays of unequal length.
- For equal length arrays, use
df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})
- Use
pandas.DataFrame.plot
, which uses matplotlib
as the default plot engine.
normed
has been replaced with density
in matplotlib
- See the
density
parameter in matplotlib.pyplot.hist
for an explanation of the y-axis values.
- For additional information:
import pandas as pd
import numpy as np
# create the uneven arrays
mu, sigma = 200, 25
np.random.seed(365)
x1 = mu + sigma*np.random.randn(990, 1)
x2 = mu + sigma*np.random.randn(980, 1)
x3 = mu + sigma*np.random.randn(1000, 1)
# create the dataframe; enumerate is used to make column names
df = pd.concat([pd.DataFrame(a, columns=[f'x{i}']) for i, a in enumerate([x1, x2, x3], 1)], axis=1)
# plot the data
df.plot.hist(stacked=True, bins=30, density=True, figsize=(10, 6), grid=True)
I’d like to create a stacked histogram. If I have a single 2-D array, made of three equal length data sets, this is simple. Code and image below:
import numpy as np
from matplotlib import pyplot as plt
# create 3 data sets with 1,000 samples
mu, sigma = 200, 25
x = mu + sigma*np.random.randn(1000,3)
#Stack the data
plt.figure()
n, bins, patches = plt.hist(x, 30, stacked=True, density = True)
plt.show()
However, if I try similar code with three data sets of a different length the results are that one histogram covers up another. Is there any way I can do the stacked histogram with mixed length data sets?
##Continued from above
###Now as three separate arrays
x1 = mu + sigma*np.random.randn(990,1)
x2 = mu + sigma*np.random.randn(980,1)
x3 = mu + sigma*np.random.randn(1000,1)
#Stack the data
plt.figure()
plt.hist(x1, bins, stacked=True, density = True)
plt.hist(x2, bins, stacked=True, density = True)
plt.hist(x3, bins, stacked=True, density = True)
plt.show()
Well, this is simple. I just need to put the three arrays in a list.
##Continued from above
###Now as three separate arrays
x1 = mu + sigma*np.random.randn(990,1)
x2 = mu + sigma*np.random.randn(980,1)
x3 = mu + sigma*np.random.randn(1000,1)
#Stack the data
plt.figure()
plt.hist([x1,x2,x3], bins, stacked=True, density=True)
plt.show()
- If
pandas
is an option, the arrays can be loaded into a dataframe and plotted. - The benefit of using pandas, is the data is now in a useful format for additional analysis and other plots.
- The following code will create a
list
ofDataFrames
withpandas.DataFrame
, for each array, and thenconcat
the arrays together in a list-comprehension.- This is a correct way to create a dataframe of arrays that are not equal in length.
- SO: Creating dataframe from a dictionary where entries have different lengths has more ways to create dataframes from arrays of unequal length.
- For equal length arrays, use
df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})
- This is a correct way to create a dataframe of arrays that are not equal in length.
- Use
pandas.DataFrame.plot
, which usesmatplotlib
as the default plot engine.normed
has been replaced withdensity
inmatplotlib
- See the
density
parameter inmatplotlib.pyplot.hist
for an explanation of the y-axis values.
- For additional information:
import pandas as pd
import numpy as np
# create the uneven arrays
mu, sigma = 200, 25
np.random.seed(365)
x1 = mu + sigma*np.random.randn(990, 1)
x2 = mu + sigma*np.random.randn(980, 1)
x3 = mu + sigma*np.random.randn(1000, 1)
# create the dataframe; enumerate is used to make column names
df = pd.concat([pd.DataFrame(a, columns=[f'x{i}']) for i, a in enumerate([x1, x2, x3], 1)], axis=1)
# plot the data
df.plot.hist(stacked=True, bins=30, density=True, figsize=(10, 6), grid=True)