How to plot a stacked seaborn histogram colored from a different column
Question:
My dataset is like this
Days Visitors
Tuesday 23
Monday 30
Sunday 120
Friday 2
Friday 30
Tuesday 13
Monday 20
Saturday 100
How can I plot a histogram for this dataset, but assume it as a large dataset(560030 rows), not just only these values.
Actually I want to have days on x-axis and Visitors on Y-axis.
Answers:
- Use
seaborn
, which is an API for matplotlib.
- This will show the distribution of the number of visitors for each day of the week.
sns.histplot
import seaborn as sns
import pandas as pd
import numpy as np # for test data
import random # for test data
import calendar # for test data
# test dataframe
np.random.seed(365)
random.seed(365)
df = pd.DataFrame({'Days': random.choices(calendar.day_name, k=1000), 'Visitors': np.random.randint(1, 121, size=(1000))})
# display(df.head(6))
Days Visitors
0 Friday 83
1 Sunday 53
2 Saturday 34
3 Wednesday 92
4 Tuesday 45
5 Wednesday 6
# plot the histogram
sns.histplot(data=df, x='Visitors', hue='Days', multiple="stack")
- Once the histogram is plotted, if the legend needs to be moved, use of the workaround found in seaborn issue: Not clear how to reposition seaborn.histplot legend #2280, may be necessary.
sns.displot
- This option most clearly conveys the daily distribution of visitor counts
sns.displot(data=df, col='Days', col_wrap=4, x='Visitors')
Barplot
seaborn.barplot
- This will show the sum of all visits for a given day
sns.barplot(data=df, x='Days', y='Visitors', estimator=sum, ci=None)
plt.xticks(rotation=90)
My dataset is like this
Days Visitors
Tuesday 23
Monday 30
Sunday 120
Friday 2
Friday 30
Tuesday 13
Monday 20
Saturday 100
How can I plot a histogram for this dataset, but assume it as a large dataset(560030 rows), not just only these values.
Actually I want to have days on x-axis and Visitors on Y-axis.
- Use
seaborn
, which is an API for matplotlib. - This will show the distribution of the number of visitors for each day of the week.
sns.histplot
import seaborn as sns
import pandas as pd
import numpy as np # for test data
import random # for test data
import calendar # for test data
# test dataframe
np.random.seed(365)
random.seed(365)
df = pd.DataFrame({'Days': random.choices(calendar.day_name, k=1000), 'Visitors': np.random.randint(1, 121, size=(1000))})
# display(df.head(6))
Days Visitors
0 Friday 83
1 Sunday 53
2 Saturday 34
3 Wednesday 92
4 Tuesday 45
5 Wednesday 6
# plot the histogram
sns.histplot(data=df, x='Visitors', hue='Days', multiple="stack")
- Once the histogram is plotted, if the legend needs to be moved, use of the workaround found in seaborn issue: Not clear how to reposition seaborn.histplot legend #2280, may be necessary.
sns.displot
- This option most clearly conveys the daily distribution of visitor counts
sns.displot(data=df, col='Days', col_wrap=4, x='Visitors')
Barplot
seaborn.barplot
- This will show the sum of all visits for a given day
sns.barplot(data=df, x='Days', y='Visitors', estimator=sum, ci=None)
plt.xticks(rotation=90)