matplotlib multicolored line from pandas DataFrame with colors from value in dataframe
Question:
I am trying to plot a DataFrame containing 3 columns, first 2 will be the coordinates of each point and the third would determine the color of the plot at that point:
X
Y
C
1
2
R
2
1
R
3
4
B
4
3
R
5
1
R
6
5
G
7
6
G
8
8
B
I grouped the data into segments of the same color:
df.groupby((df['C']!=df['C'].shift()).cumsum())
And then tried to call .plot
for each group, but the displayed plot had discontinuities and was also extremely slow as the amount of data is quite large.
I found this example and I believe using LineCollection
and ListedColormap
could be the right solution, but being new to the ecosystem, I’m failing to understand how I could adapt it to work with the described DataFrame.
Answers:
Adapting the linked code to your example is quite straightforward.
Note that the last color won’t be used.
Some remarks:
- Your list of colors aren’t valid matplotlib colors. They need to be in lowercase.
- The code uses segments of two points. If you’d try to combine segments with the same color to larger segments, the fast numpy array operations can’t be used anymore.
autoscale_view()
or explicitly setting the x and y limits (as in the tutorial) is needed because matplotlib doesn’t do this automatically when elements are added (instead of plotted)
Working directly with the colors
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
import pandas as pd
import numpy as np
df = pd.read_html('https://stackoverflow.com/questions/75695487')[0]
points = np.c_[df['X'], df['Y']]
segments = np.c_[points[:-1], points[1:]].reshape(-1, 2, 2)
lc = LineCollection(segments, colors=df['C'].str.lower())
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale_view()
plt.show()
Creating a colormap from the dataframe column
If you have a really large dataframe, you could create a listed colormap with all the colors. pd.Categorical
will create both the list of colors and their internal numeric representation.
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
from matplotlib.collections import LineCollection
import pandas as pd
import numpy as np
df = pd.read_html('https://stackoverflow.com/questions/75695487')[0]
points = np.c_[df['X'], df['Y']]
segments = np.c_[points[:-1], points[1:]].reshape(-1, 2, 2)
df['C'] = pd.Categorical(df['C']) # explicitly make categorical
lc = LineCollection(segments,
cmap=ListedColormap(df['C'].cat.categories.str.lower()),
array=df['C'].cat.codes)
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale_view()
plt.show()
I am trying to plot a DataFrame containing 3 columns, first 2 will be the coordinates of each point and the third would determine the color of the plot at that point:
X | Y | C |
---|---|---|
1 | 2 | R |
2 | 1 | R |
3 | 4 | B |
4 | 3 | R |
5 | 1 | R |
6 | 5 | G |
7 | 6 | G |
8 | 8 | B |
I grouped the data into segments of the same color:
df.groupby((df['C']!=df['C'].shift()).cumsum())
And then tried to call .plot
for each group, but the displayed plot had discontinuities and was also extremely slow as the amount of data is quite large.
I found this example and I believe using LineCollection
and ListedColormap
could be the right solution, but being new to the ecosystem, I’m failing to understand how I could adapt it to work with the described DataFrame.
Adapting the linked code to your example is quite straightforward.
Note that the last color won’t be used.
Some remarks:
- Your list of colors aren’t valid matplotlib colors. They need to be in lowercase.
- The code uses segments of two points. If you’d try to combine segments with the same color to larger segments, the fast numpy array operations can’t be used anymore.
autoscale_view()
or explicitly setting the x and y limits (as in the tutorial) is needed because matplotlib doesn’t do this automatically when elements are added (instead of plotted)
Working directly with the colors
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
import pandas as pd
import numpy as np
df = pd.read_html('https://stackoverflow.com/questions/75695487')[0]
points = np.c_[df['X'], df['Y']]
segments = np.c_[points[:-1], points[1:]].reshape(-1, 2, 2)
lc = LineCollection(segments, colors=df['C'].str.lower())
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale_view()
plt.show()
Creating a colormap from the dataframe column
If you have a really large dataframe, you could create a listed colormap with all the colors. pd.Categorical
will create both the list of colors and their internal numeric representation.
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
from matplotlib.collections import LineCollection
import pandas as pd
import numpy as np
df = pd.read_html('https://stackoverflow.com/questions/75695487')[0]
points = np.c_[df['X'], df['Y']]
segments = np.c_[points[:-1], points[1:]].reshape(-1, 2, 2)
df['C'] = pd.Categorical(df['C']) # explicitly make categorical
lc = LineCollection(segments,
cmap=ListedColormap(df['C'].cat.categories.str.lower()),
array=df['C'].cat.codes)
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale_view()
plt.show()