Reuse colors in plot

Question:

I have a project in Jupyter notebooks where I am comparing two dataframes. Both are indexed by year, and both have the same columns representing the proportion of followers of a religion in the population. The two dataframes represent two different populations.

I want to be able to display both sets of data in the same line plot, with the same color used for each religion, but with the lines for one population solid, while the lines for the other population are dashed.

I thought I’d be able to do something like this:

ax1.plot(area1_df, color=[col1,col2,col3,col4])
ax1.plot(area2_df, color=[col1,col2,col3,col4], ls=':',alpha=0.5, linewidth=3.0)

But that doesn’t work.

At the moment I have this:

import matplotlib.pyplot as plt

fig, ax1 = plt.subplots(1,1, sharex = True, sharey=True, figsize=(15,5))
plt.style.use('seaborn')

ax1.plot(area1_df);
ax1.plot(area2_df, ls=':',alpha=0.5, linewidth=3.0);

ax1.legend(area1_df.keys(), loc=2)
ax1.set_ylabel('% of population')
plt.tight_layout()

Maybe there’s another way of doing this entirely(?)

Bonus points for any ideas as to how best to create a unified legend, with entries for the columns from both dataframes.

Asked By: Andy Wilson

||

Answers:

To give each line a particular color, you could capture the output of ax1.plot and iterate through that list of lines. Each line can be given its color. And also a label for the legend.

The following code first generates some toy data and then iterates through the lines of both plots. A legend with two columns is created using the assigned labels.

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

years = np.arange(1990, 2021, 1)
N = years.size
area1_df = pd.DataFrame({f'group {i}': 10+i+np.random.uniform(-1, 1, N).cumsum() for i in range(1, 5)}, index=years)
area2_df = pd.DataFrame({f'group {i}': 10+i+np.random.uniform(-1, 1, N).cumsum() for i in range(1, 5)}, index=years)

fig, ax1 = plt.subplots(figsize=(15,5))
plot1 = ax1.plot(area1_df)
plot2 = ax1.plot(area2_df, ls=':',alpha=0.5, linewidth=3.0)
for l1, l2, label1, label2, color in zip(plot1, plot2, area1_df.columns, area2_df.columns,
                                         ['crimson', 'limegreen', 'dodgerblue', 'turquoise']):
    l1.set_color(color)
    l1.set_label(label1)
    l2.set_color(color)
    l2.set_label(label2)
ax1.legend(ncol=2, title='area1 / area2')
plt.show()

example plot

Alternatively, you could plot via pandas, which does allow assigning a color for each column:

fig, ax1 = plt.subplots(figsize=(15, 5))
colors = plt.cm.Dark2.colors
area1_df.plot(color=colors, ax=ax1)
area2_df.plot(color=colors, ls=':', alpha=0.5, linewidth=3.0, ax=ax1)
ax1.legend(ncol=2, title='area1 / area2')
Answered By: JohanC

The principle of color assignment in pyplot is based on a cycler, a list of colors, which is reset after the last one has been used. Hence it’s possible to reuse colors by selecting the proper number of colors in the cycler.

In the code below, a cycler is created with colors from the default cycler. There are two lists of curves to plot. The number of colors is made equal to the number of curves in the first list, curves from the second list are plotted after the cycler has reset itself.

enter image description here

from numpy import linspace, pi, cos, random
import matplotlib.pyplot as plt

# Time
t = linspace(-0.5*pi, 0.5*pi, 100)

# Curves
a_p = (1.2, 0), (1, -3*pi/2), (1.4, -pi/4)
series_1 = [a * cos(t+p) for a, p in a_p]
series_2 = [c + 0.5 * (random.rand(len(c)) - 0.5) for c in series_1]

# Create a color cycler with 3 colors
colors = plt.rcParams['axes.prop_cycle'][0:len(series_1)]
cycler_2 = plt.cycler(color=colors)

# Associate cycler to axis
fig, ax = plt.subplots()
ax.set_prop_cycle(cycler_2)

# Plot
for c in series_1: ax.plot(t, c, ls='--', lw=3)
for c in series_2: ax.plot(t, c, ls=':')
Answered By: mins