pandas datetime plotting issue
Question:
I’m not sure what’s going on here, but when I try to do a scatter plot with a dataframe that has the index set to datetimes, I get a much wider range of dates in the plot for the x-axis. Here’s an example:
import matplotlib.pyplot as plt
import pandas as pd
datetimes = ['2020-01-01 01:00:00', '2020-01-01 01:00:05',
'2020-01-01 01:00:10', '2020-01-01 01:00:15',
'2020-01-01 01:00:20', '2020-01-01 01:00:25',
'2020-01-01 01:00:30', '2020-01-01 01:00:35',
'2020-01-01 01:00:40', '2020-01-01 01:00:45']
datetimes = pd.to_datetime(datetimes)
values = [1,2,3,4,5,6,7,8,9,10]
df = pd.DataFrame()
df['values'] = values
df = df.set_index(datetimes)
fig, ax = plt.subplots(figsize=(16,9))
ax.scatter(df.index, df.values)
plt.show()
Yet if I do a plot instead of a scatter
fig, ax = plt.subplots(figsize=(16,9))
ax.plot(df)
plt.show()
I don’t understand why the x-axis has a huge date range on the scatter plot which is not included in the datetime range I gave it. It appears to work correctly using plot
but not scatter
. I’m guessing I’m missing something obvious here but I haven’t had any success googling it. Any insight would be greatly appreciated!
Answers:
I sligtly shortened your code to:
datetimes = ['2020-01-01 01:00:00', '2020-01-01 01:00:05',
'2020-01-01 01:00:10', '2020-01-01 01:00:15',
'2020-01-01 01:00:20', '2020-01-01 01:00:25',
'2020-01-01 01:00:30', '2020-01-01 01:00:35',
'2020-01-01 01:00:40', '2020-01-01 01:00:45']
values = [1,2,3,4,5,6,7,8,9,10]
df = pd.DataFrame({'values': values}, index=pd.to_datetime(datetimes))
fig, ax = plt.subplots(figsize=(10,4))
ax.scatter(df.index, df['values'])
plt.show()
but it should not matter.
Another detail is that df.values retrieves the underlying Numpy array,
whereas df[‘values’] (as I wrote) retrieves just the column of interest.
The plot I got is quite as expected:
Maybe it is a matter of the version of Pandas and/or Pyplot.
I use Pandas version 1.0.3 and Pyplot version 3.2.1.
If you have older versions, maybe you should upgrade?
Another option: Set manually x axis limits:
plt.xlim(pd.to_datetime('2020-01-01 00:59:55'),
pd.to_datetime('2020-01-01 01:00:50'))
your code runs just fine on my machine (matplotlib 3.2.2 and pandas 1.0.5). what version of matplotlib and pandas you’re in?
try updating your libraries or use this:
ax.set_xlim(df.index[0], df.index[-1])
I’m not sure what’s going on here, but when I try to do a scatter plot with a dataframe that has the index set to datetimes, I get a much wider range of dates in the plot for the x-axis. Here’s an example:
import matplotlib.pyplot as plt
import pandas as pd
datetimes = ['2020-01-01 01:00:00', '2020-01-01 01:00:05',
'2020-01-01 01:00:10', '2020-01-01 01:00:15',
'2020-01-01 01:00:20', '2020-01-01 01:00:25',
'2020-01-01 01:00:30', '2020-01-01 01:00:35',
'2020-01-01 01:00:40', '2020-01-01 01:00:45']
datetimes = pd.to_datetime(datetimes)
values = [1,2,3,4,5,6,7,8,9,10]
df = pd.DataFrame()
df['values'] = values
df = df.set_index(datetimes)
fig, ax = plt.subplots(figsize=(16,9))
ax.scatter(df.index, df.values)
plt.show()
Yet if I do a plot instead of a scatter
fig, ax = plt.subplots(figsize=(16,9))
ax.plot(df)
plt.show()
I don’t understand why the x-axis has a huge date range on the scatter plot which is not included in the datetime range I gave it. It appears to work correctly using plot
but not scatter
. I’m guessing I’m missing something obvious here but I haven’t had any success googling it. Any insight would be greatly appreciated!
I sligtly shortened your code to:
datetimes = ['2020-01-01 01:00:00', '2020-01-01 01:00:05',
'2020-01-01 01:00:10', '2020-01-01 01:00:15',
'2020-01-01 01:00:20', '2020-01-01 01:00:25',
'2020-01-01 01:00:30', '2020-01-01 01:00:35',
'2020-01-01 01:00:40', '2020-01-01 01:00:45']
values = [1,2,3,4,5,6,7,8,9,10]
df = pd.DataFrame({'values': values}, index=pd.to_datetime(datetimes))
fig, ax = plt.subplots(figsize=(10,4))
ax.scatter(df.index, df['values'])
plt.show()
but it should not matter.
Another detail is that df.values retrieves the underlying Numpy array,
whereas df[‘values’] (as I wrote) retrieves just the column of interest.
The plot I got is quite as expected:
Maybe it is a matter of the version of Pandas and/or Pyplot.
I use Pandas version 1.0.3 and Pyplot version 3.2.1.
If you have older versions, maybe you should upgrade?
Another option: Set manually x axis limits:
plt.xlim(pd.to_datetime('2020-01-01 00:59:55'),
pd.to_datetime('2020-01-01 01:00:50'))
your code runs just fine on my machine (matplotlib 3.2.2 and pandas 1.0.5). what version of matplotlib and pandas you’re in?
try updating your libraries or use this:
ax.set_xlim(df.index[0], df.index[-1])