TOP 10 Dataframe + Bar plot
Question:
i would like to:
- Store in a director series all the directors present in the director column of df.
- Display in a horizontal bar graph the 10 most present directors in the catalogue.
Do I need to make a value.count first ? To set the top 10 before creating the plt.bar ?
# divided the director name
df['director'].str.split(',', expand=True).stack().reset_index(drop=True)
Answers:
You can create a countplot
and use the order=
parameter to select the 10 highest counts:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# directors = df['director'].str.split(',', expand=True).stack().reset_index(drop=True)
np.random.seed(123456)
directors = pd.Series(np.random.choice(
['Allen', 'Almodóvar', 'Bergman', 'Buñuel', 'Chaplin', 'Eastwood', 'Fassbinder', 'Fellini', 'Hitchcock', 'Keaton',
'Kubrick', 'Polanski', 'Renoir', 'Scorsese', 'Spielberg', 'Welles', 'Wenders', 'Wilder'], 200), name='Director')
ax = sns.countplot(y=directors, order=directors.value_counts().iloc[:10].index, palette='rocket')
ax.tick_params(axis='y', length=0)
plt.tight_layout()
plt.show()
c. Top 10 recovered countries (Bar plot)
top10_recovered = pd.DataFrame(data.groupby(‘Country’)[‘Recovered’].sum().nlargest(10).sort_values(ascending = False))
fig3 = px.bar(top10_recovered, x = top10_recovered.index, y = ‘Recovered’, height = 600, color = ‘Recovered’,
title = ‘Top 10 Recovered Cases Countries’, color_continuous_scale = px.colors.sequential.Viridis)
fig3.show()
i would like to:
- Store in a director series all the directors present in the director column of df.
- Display in a horizontal bar graph the 10 most present directors in the catalogue.
Do I need to make a value.count first ? To set the top 10 before creating the plt.bar ?
# divided the director name
df['director'].str.split(',', expand=True).stack().reset_index(drop=True)
You can create a countplot
and use the order=
parameter to select the 10 highest counts:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# directors = df['director'].str.split(',', expand=True).stack().reset_index(drop=True)
np.random.seed(123456)
directors = pd.Series(np.random.choice(
['Allen', 'Almodóvar', 'Bergman', 'Buñuel', 'Chaplin', 'Eastwood', 'Fassbinder', 'Fellini', 'Hitchcock', 'Keaton',
'Kubrick', 'Polanski', 'Renoir', 'Scorsese', 'Spielberg', 'Welles', 'Wenders', 'Wilder'], 200), name='Director')
ax = sns.countplot(y=directors, order=directors.value_counts().iloc[:10].index, palette='rocket')
ax.tick_params(axis='y', length=0)
plt.tight_layout()
plt.show()
c. Top 10 recovered countries (Bar plot)
top10_recovered = pd.DataFrame(data.groupby(‘Country’)[‘Recovered’].sum().nlargest(10).sort_values(ascending = False))
fig3 = px.bar(top10_recovered, x = top10_recovered.index, y = ‘Recovered’, height = 600, color = ‘Recovered’,
title = ‘Top 10 Recovered Cases Countries’, color_continuous_scale = px.colors.sequential.Viridis)
fig3.show()