How to plot multiple lines with different X indices

Question:

I have 3 data frames as so:

Range is the index second Column is Number

print(df1)
Range
1.27     2386.0
0.93     5598.0
1.27     3607.0
1.29     2262.0
0.94    12227.0

print(df2)
Range
1.26     6410.0
1.27     5688.0
1.25     7329.0
0.93     7757.0
1.26     2118.0
1.26     5772.0

print(df2)
Range
0.92     3368.0
1.26     4935.0
1.28     4749.0
0.94    13716.0
0.92     8478.0
1.27     7997.0
0.92    12459.0
0.92     5805.0
0.92     7842.0
1.26     8316.0
1.27    11069.0
1.27    10011.0

How can I plot a singe graph with the x-axis using the range and the y-axis using the second column?

I tried:

df=pd.concat([df1,df2,df3], ignore_index=True, axis=1)
ax=sns.lineplot(data=df,  markers=True, dashes=False)
ax.set_title("Range vs Number")
ax.set(xlabel="Range (m)", ylabel = "Number")
plt.legend(loc='upper right', labels=['df1','df2','df3'])
plt.show()

But go the following error:

ValueError: cannot reindex from a duplicate axis
Asked By: wwjdm

||

Answers:

Test DataFrames and Imports

  • Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
import pandas as pd
import matplotlib.pyplot as plt

d1 = {'Range': [0.93, 0.94, 1.27, 1.29],
      'Number': [5598.0, 12227.0, 3607.0, 2262.0]}
df1 = pd.DataFrame(d1).set_index('Range')

d2 = {'Range': [1.26, 1.27, 1.25, 0.93, 1.26, 1.26],
      'Number': [6410.0, 5688.0, 7329.0, 7757.0, 2118.0, 5772.0]}
df2 = pd.DataFrame(d2).set_index('Range')

d3 = {'Range': [0.92, 1.26, 1.28, 0.94, 0.92, 1.27, 0.92, 0.92, 0.92, 1.26, 1.27, 1.27],
      'Number': [3368.0, 4935.0, 4749.0, 13716.0, 8478.0, 7997.0, 12459.0, 5805.0, 7842.0, 8316.0, 11069.0, 10011.0]}
df3 = pd.DataFrame(d3).set_index('Range')

Combine DataFrames without an Identifying Column

df = pd.concat([df1, df2, df3])

estimator='mean': default

  • As shown in the OP. When there are multiple points for a single index, the 'mean' is plotted and a confidence interval is shown.
ax = sns.lineplot(data=df, markers=True)

enter image description here

estimator=None

  • A single line with all the points shown
ax = sns.lineplot(data=df, markers=True, estimator=None)

enter image description here


Plot Each DataFrame in a Loop

matplotlib.pyplot.plot

for i, df in enumerate([df1, df2, df3], 1):
    df = df.sort_index()  # the index must be sorted
    plt.plot(df.index, df['Number'], label=f'df{i}', marker='.')
plt.legend()
plt.show()

enter image description here

pandas.DataFrame.plot

  • Each DataFrame can be plotted directly with df.plot instead of plt.plot, but this option requires a little more setup.
    • Create an ax to plot each DataFrame to, or multiple figures will be created.
    • The legend labels must be updated after creating the plot.
fig, ax = plt.subplots()
for i, df in enumerate([df1, df2, df3], 1):
    df = df.sort_index()
    df.plot(marker='.', ax=ax)
    
ax.legend(labels=[f'df{i}' for i in range(1, 4)])
plt.show()

Combine DataFrames with an Identifying Column

# create a single dataframe with an identifying column
df = pd.concat([df.assign(id=f'df{i}') for i, df in enumerate([df1, df2, df3], 1)])

# reset the index
df = df.reset_index()

# plot using hue to separate the lines
ax = sns.lineplot(data=df, x='Range', y='Number', hue='id', estimator=None, marker='o', hue_order=['df1', 'df2', 'df3'])
  • Note the difference with sns.lineplot comparted to plt.plot in the loop. seaborn internally sorts the values for the x-axis, and the y-axis.
    • df = df.sort_values(['Range', 'Number'])

enter image description here

  • Adjusting the loop to reset the index, and sort by the 'Range' (the index) and 'Number'.
for i, df in enumerate([df1, df2, df3], 1):
    df = df.reset_index().sort_values(['Range', 'Number'])
    plt.plot('Range', 'Number', data=df, label=f'df{i}', marker='.')
plt.legend()
plt.show()

enter image description here

Answered By: Trenton McKinney