How to plot multiple lines with different X indices
Question:
I have 3 data frames as so:
Range is the index second Column is Number
print(df1)
Range
1.27 2386.0
0.93 5598.0
1.27 3607.0
1.29 2262.0
0.94 12227.0
print(df2)
Range
1.26 6410.0
1.27 5688.0
1.25 7329.0
0.93 7757.0
1.26 2118.0
1.26 5772.0
print(df2)
Range
0.92 3368.0
1.26 4935.0
1.28 4749.0
0.94 13716.0
0.92 8478.0
1.27 7997.0
0.92 12459.0
0.92 5805.0
0.92 7842.0
1.26 8316.0
1.27 11069.0
1.27 10011.0
How can I plot a singe graph with the x-axis using the range and the y-axis using the second column?
I tried:
df=pd.concat([df1,df2,df3], ignore_index=True, axis=1)
ax=sns.lineplot(data=df, markers=True, dashes=False)
ax.set_title("Range vs Number")
ax.set(xlabel="Range (m)", ylabel = "Number")
plt.legend(loc='upper right', labels=['df1','df2','df3'])
plt.show()
But go the following error:
ValueError: cannot reindex from a duplicate axis
Answers:
Test DataFrames and Imports
- Tested in
python 3.11
, pandas 1.5.2
, matplotlib 3.6.2
, seaborn 0.12.1
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'Range': [0.93, 0.94, 1.27, 1.29],
'Number': [5598.0, 12227.0, 3607.0, 2262.0]}
df1 = pd.DataFrame(d1).set_index('Range')
d2 = {'Range': [1.26, 1.27, 1.25, 0.93, 1.26, 1.26],
'Number': [6410.0, 5688.0, 7329.0, 7757.0, 2118.0, 5772.0]}
df2 = pd.DataFrame(d2).set_index('Range')
d3 = {'Range': [0.92, 1.26, 1.28, 0.94, 0.92, 1.27, 0.92, 0.92, 0.92, 1.26, 1.27, 1.27],
'Number': [3368.0, 4935.0, 4749.0, 13716.0, 8478.0, 7997.0, 12459.0, 5805.0, 7842.0, 8316.0, 11069.0, 10011.0]}
df3 = pd.DataFrame(d3).set_index('Range')
Combine DataFrames without an Identifying Column
df = pd.concat([df1, df2, df3])
estimator='mean'
: default
- As shown in the OP. When there are multiple points for a single index, the
'mean'
is plotted and a confidence interval is shown.
ax = sns.lineplot(data=df, markers=True)
estimator=None
- A single line with all the points shown
ax = sns.lineplot(data=df, markers=True, estimator=None)
Plot Each DataFrame in a Loop
matplotlib.pyplot.plot
for i, df in enumerate([df1, df2, df3], 1):
df = df.sort_index() # the index must be sorted
plt.plot(df.index, df['Number'], label=f'df{i}', marker='.')
plt.legend()
plt.show()
pandas.DataFrame.plot
- Each DataFrame can be plotted directly with
df.plot
instead of plt.plot
, but this option requires a little more setup.
- Create an
ax
to plot each DataFrame to, or multiple figures will be created.
- The legend labels must be updated after creating the plot.
fig, ax = plt.subplots()
for i, df in enumerate([df1, df2, df3], 1):
df = df.sort_index()
df.plot(marker='.', ax=ax)
ax.legend(labels=[f'df{i}' for i in range(1, 4)])
plt.show()
Combine DataFrames with an Identifying Column
# create a single dataframe with an identifying column
df = pd.concat([df.assign(id=f'df{i}') for i, df in enumerate([df1, df2, df3], 1)])
# reset the index
df = df.reset_index()
# plot using hue to separate the lines
ax = sns.lineplot(data=df, x='Range', y='Number', hue='id', estimator=None, marker='o', hue_order=['df1', 'df2', 'df3'])
- Note the difference with
sns.lineplot
comparted to plt.plot
in the loop. seaborn
internally sorts the values for the x-axis, and the y-axis.
df = df.sort_values(['Range', 'Number'])
- Adjusting the loop to reset the index, and sort by the
'Range'
(the index
) and 'Number'
.
for i, df in enumerate([df1, df2, df3], 1):
df = df.reset_index().sort_values(['Range', 'Number'])
plt.plot('Range', 'Number', data=df, label=f'df{i}', marker='.')
plt.legend()
plt.show()
I have 3 data frames as so:
Range is the index second Column is Number
print(df1)
Range
1.27 2386.0
0.93 5598.0
1.27 3607.0
1.29 2262.0
0.94 12227.0
print(df2)
Range
1.26 6410.0
1.27 5688.0
1.25 7329.0
0.93 7757.0
1.26 2118.0
1.26 5772.0
print(df2)
Range
0.92 3368.0
1.26 4935.0
1.28 4749.0
0.94 13716.0
0.92 8478.0
1.27 7997.0
0.92 12459.0
0.92 5805.0
0.92 7842.0
1.26 8316.0
1.27 11069.0
1.27 10011.0
How can I plot a singe graph with the x-axis using the range and the y-axis using the second column?
I tried:
df=pd.concat([df1,df2,df3], ignore_index=True, axis=1)
ax=sns.lineplot(data=df, markers=True, dashes=False)
ax.set_title("Range vs Number")
ax.set(xlabel="Range (m)", ylabel = "Number")
plt.legend(loc='upper right', labels=['df1','df2','df3'])
plt.show()
But go the following error:
ValueError: cannot reindex from a duplicate axis
Test DataFrames and Imports
- Tested in
python 3.11
,pandas 1.5.2
,matplotlib 3.6.2
,seaborn 0.12.1
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'Range': [0.93, 0.94, 1.27, 1.29],
'Number': [5598.0, 12227.0, 3607.0, 2262.0]}
df1 = pd.DataFrame(d1).set_index('Range')
d2 = {'Range': [1.26, 1.27, 1.25, 0.93, 1.26, 1.26],
'Number': [6410.0, 5688.0, 7329.0, 7757.0, 2118.0, 5772.0]}
df2 = pd.DataFrame(d2).set_index('Range')
d3 = {'Range': [0.92, 1.26, 1.28, 0.94, 0.92, 1.27, 0.92, 0.92, 0.92, 1.26, 1.27, 1.27],
'Number': [3368.0, 4935.0, 4749.0, 13716.0, 8478.0, 7997.0, 12459.0, 5805.0, 7842.0, 8316.0, 11069.0, 10011.0]}
df3 = pd.DataFrame(d3).set_index('Range')
Combine DataFrames without an Identifying Column
df = pd.concat([df1, df2, df3])
estimator='mean'
: default
- As shown in the OP. When there are multiple points for a single index, the
'mean'
is plotted and a confidence interval is shown.
ax = sns.lineplot(data=df, markers=True)
estimator=None
- A single line with all the points shown
ax = sns.lineplot(data=df, markers=True, estimator=None)
Plot Each DataFrame in a Loop
matplotlib.pyplot.plot
for i, df in enumerate([df1, df2, df3], 1):
df = df.sort_index() # the index must be sorted
plt.plot(df.index, df['Number'], label=f'df{i}', marker='.')
plt.legend()
plt.show()
pandas.DataFrame.plot
- Each DataFrame can be plotted directly with
df.plot
instead ofplt.plot
, but this option requires a little more setup.- Create an
ax
to plot each DataFrame to, or multiple figures will be created. - The legend labels must be updated after creating the plot.
- Create an
fig, ax = plt.subplots()
for i, df in enumerate([df1, df2, df3], 1):
df = df.sort_index()
df.plot(marker='.', ax=ax)
ax.legend(labels=[f'df{i}' for i in range(1, 4)])
plt.show()
Combine DataFrames with an Identifying Column
# create a single dataframe with an identifying column
df = pd.concat([df.assign(id=f'df{i}') for i, df in enumerate([df1, df2, df3], 1)])
# reset the index
df = df.reset_index()
# plot using hue to separate the lines
ax = sns.lineplot(data=df, x='Range', y='Number', hue='id', estimator=None, marker='o', hue_order=['df1', 'df2', 'df3'])
- Note the difference with
sns.lineplot
comparted toplt.plot
in the loop.seaborn
internally sorts the values for the x-axis, and the y-axis.df = df.sort_values(['Range', 'Number'])
- Adjusting the loop to reset the index, and sort by the
'Range'
(theindex
) and'Number'
.
for i, df in enumerate([df1, df2, df3], 1):
df = df.reset_index().sort_values(['Range', 'Number'])
plt.plot('Range', 'Number', data=df, label=f'df{i}', marker='.')
plt.legend()
plt.show()