Hide certain categorical element from legend in Plotnine
Question:
In Plotnine, is it possible to hide certain legend elements?
mpg_select = mpg[mpg["manufacturer"].isin(pd.Series(["audi", "ford", "honda", "hyundai"]))]
I have selected only 4 manufacturers. But when I plot the data, I still see the manufacturers that are not in the data as elements for my legend.
(ggplot(mpg_select, aes(x="displ", y="cty"))
+ geom_jitter(aes(size="hwy", color="manufacturer"))
+ geom_smooth(aes(color="manufacturer"), method="lm", se=False)
+ labs(title="Bubble chart")
)
How do I show only the manufacturer that I selected (audi, ford, honda, and hyundai) as my legend?
Answers:
It’s because the manufacturer
column is categorical and it still has all those categories. You can remove the categories from the column and the extra values will remove from the legend.
from plotnine.data import mpg
desired_manufacturers = ['audi','ford','honda','hyundai']
mpg_select = mpg.loc[mpg['manufacturer'].isin(desired_manufacturers)]
mpg_select['manufacturer_subset'] = pd.Categorical(mpg_select['manufacturer'],
categories=desired_manufacturers)
(ggplot(mpg_select, aes(x="displ", y="cty"))
+ geom_jitter(aes(size="hwy", color="manufacturer_subset"))
+ geom_smooth(aes(color="manufacturer_subset"), method="lm", se=False)
+ labs(title="Bubble chart")
)
I had a similar issue and I found that remove_unused_categories()
did a cleaner job. You don’t need to create a new variable, it just removes the missing categories after the filtering:
from plotnine.data import mpg
desired_manufacturers = ['audi','ford','honda','hyundai']
mpg_select = mpg.loc[mpg['manufacturer'].isin(desired_manufacturers)]
mpg_select["manufacturer"] = mpg_select["manufacturer"].cat.remove_unused_categories()
(ggplot(mpg_select, aes(x="displ", y="cty"))
+ geom_jitter(aes(size="hwy", color="manufacturer"))
+ geom_smooth(aes(color="manufacturer"), method="lm", se=False)
+ labs(title="Bubble chart")
)
In Plotnine, is it possible to hide certain legend elements?
mpg_select = mpg[mpg["manufacturer"].isin(pd.Series(["audi", "ford", "honda", "hyundai"]))]
I have selected only 4 manufacturers. But when I plot the data, I still see the manufacturers that are not in the data as elements for my legend.
(ggplot(mpg_select, aes(x="displ", y="cty"))
+ geom_jitter(aes(size="hwy", color="manufacturer"))
+ geom_smooth(aes(color="manufacturer"), method="lm", se=False)
+ labs(title="Bubble chart")
)
How do I show only the manufacturer that I selected (audi, ford, honda, and hyundai) as my legend?
It’s because the manufacturer
column is categorical and it still has all those categories. You can remove the categories from the column and the extra values will remove from the legend.
from plotnine.data import mpg
desired_manufacturers = ['audi','ford','honda','hyundai']
mpg_select = mpg.loc[mpg['manufacturer'].isin(desired_manufacturers)]
mpg_select['manufacturer_subset'] = pd.Categorical(mpg_select['manufacturer'],
categories=desired_manufacturers)
(ggplot(mpg_select, aes(x="displ", y="cty"))
+ geom_jitter(aes(size="hwy", color="manufacturer_subset"))
+ geom_smooth(aes(color="manufacturer_subset"), method="lm", se=False)
+ labs(title="Bubble chart")
)
I had a similar issue and I found that remove_unused_categories()
did a cleaner job. You don’t need to create a new variable, it just removes the missing categories after the filtering:
from plotnine.data import mpg
desired_manufacturers = ['audi','ford','honda','hyundai']
mpg_select = mpg.loc[mpg['manufacturer'].isin(desired_manufacturers)]
mpg_select["manufacturer"] = mpg_select["manufacturer"].cat.remove_unused_categories()
(ggplot(mpg_select, aes(x="displ", y="cty"))
+ geom_jitter(aes(size="hwy", color="manufacturer"))
+ geom_smooth(aes(color="manufacturer"), method="lm", se=False)
+ labs(title="Bubble chart")
)