plotting two DataFrame.value_counts() in a single histogram
Question:
I want to plot in a single histogram two different dataframes (only one column from each).
d1 = {'Size': ['Big', 'Big', 'Normal','Big']}
df1 = pd.DataFrame(data=d1)
d2 = {'Size': ['Small','Normal','Normal','Normal', 'Small', 'Big', 'Big', 'Normal','Big']}
df2 = pd.DataFrame(data=d2)
#Plotting in one histogram
df1['Size'].value_counts().plot.bar(label = "df1")
df2['Size'].value_counts().plot.bar(label = "df2", alpha = 0.2,color='purple')
plt.legend(loc='upper right')
plt.show()
The issue is that in the x-axis of the histogram is only correct for df2. For df1 there should be 3 values of ‘big’ and 1 value of ‘normal’:
I have tried multiple ways of generating the plot and this is the closest I got to what I want, which is both dataframes in the same histogram, with different colors.
Ideally they would be side to side, but I didn’t manage to find how, and ‘stacked = False’ doesn’t work here.
Any help is welcome. Thanks!
Answers:
You can reindex
on explicit X-values:
x = ['Small', 'Normal', 'Big']
df1['Size'].value_counts().reindex(x).plot.bar(label = "df1")
df2['Size'].value_counts().reindex(x).plot.bar(label = "df2", alpha = 0.2,color='purple')
Output:
Another option:
(pd.concat({'df1': df1, 'df2': df2})['Size']
.groupby(level=0).value_counts()
.unstack(0)
.plot.bar()
)
Output:
You can also try plotly
which produces interactive graphs. That is we can hover over the plots and see exact data values and other information.
import plotly.graph_objects as go
classes=['Small', 'Normal', 'Large']
#classes=df2.Size.unique() (better to use this)
fig = go.Figure(data=[
go.Bar(name='df1', x=classes, y=df1.value_counts()),
go.Bar(name='df2', x=classes, y=df2.value_counts())
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()
I want to plot in a single histogram two different dataframes (only one column from each).
d1 = {'Size': ['Big', 'Big', 'Normal','Big']}
df1 = pd.DataFrame(data=d1)
d2 = {'Size': ['Small','Normal','Normal','Normal', 'Small', 'Big', 'Big', 'Normal','Big']}
df2 = pd.DataFrame(data=d2)
#Plotting in one histogram
df1['Size'].value_counts().plot.bar(label = "df1")
df2['Size'].value_counts().plot.bar(label = "df2", alpha = 0.2,color='purple')
plt.legend(loc='upper right')
plt.show()
The issue is that in the x-axis of the histogram is only correct for df2. For df1 there should be 3 values of ‘big’ and 1 value of ‘normal’:
I have tried multiple ways of generating the plot and this is the closest I got to what I want, which is both dataframes in the same histogram, with different colors.
Ideally they would be side to side, but I didn’t manage to find how, and ‘stacked = False’ doesn’t work here.
Any help is welcome. Thanks!
You can reindex
on explicit X-values:
x = ['Small', 'Normal', 'Big']
df1['Size'].value_counts().reindex(x).plot.bar(label = "df1")
df2['Size'].value_counts().reindex(x).plot.bar(label = "df2", alpha = 0.2,color='purple')
Output:
Another option:
(pd.concat({'df1': df1, 'df2': df2})['Size']
.groupby(level=0).value_counts()
.unstack(0)
.plot.bar()
)
Output:
You can also try plotly
which produces interactive graphs. That is we can hover over the plots and see exact data values and other information.
import plotly.graph_objects as go
classes=['Small', 'Normal', 'Large']
#classes=df2.Size.unique() (better to use this)
fig = go.Figure(data=[
go.Bar(name='df1', x=classes, y=df1.value_counts()),
go.Bar(name='df2', x=classes, y=df2.value_counts())
])
# Change the bar mode
fig.update_layout(barmode='group')
fig.show()