pandas datetime plotting issue

Question:

I’m not sure what’s going on here, but when I try to do a scatter plot with a dataframe that has the index set to datetimes, I get a much wider range of dates in the plot for the x-axis. Here’s an example:

import matplotlib.pyplot as plt
import pandas as pd

datetimes = ['2020-01-01 01:00:00', '2020-01-01 01:00:05',
             '2020-01-01 01:00:10', '2020-01-01 01:00:15',
             '2020-01-01 01:00:20', '2020-01-01 01:00:25',
             '2020-01-01 01:00:30', '2020-01-01 01:00:35',
             '2020-01-01 01:00:40', '2020-01-01 01:00:45']
datetimes = pd.to_datetime(datetimes)
values = [1,2,3,4,5,6,7,8,9,10]
df = pd.DataFrame()
df['values'] = values
df = df.set_index(datetimes)

fig, ax = plt.subplots(figsize=(16,9))
ax.scatter(df.index, df.values)
plt.show()

I get this:
This does not plot correctly

Yet if I do a plot instead of a scatter

fig, ax = plt.subplots(figsize=(16,9))
ax.plot(df)
plt.show()

I get:
This plots correctly

I don’t understand why the x-axis has a huge date range on the scatter plot which is not included in the datetime range I gave it. It appears to work correctly using plot but not scatter. I’m guessing I’m missing something obvious here but I haven’t had any success googling it. Any insight would be greatly appreciated!

Asked By: Azathoth

||

Answers:

I don’t know the reason without research but if you use plt.xlim(df.index[0], df.index[-1]) you can move on:
enter image description here

Answered By: nesaboz

I sligtly shortened your code to:

datetimes = ['2020-01-01 01:00:00', '2020-01-01 01:00:05',
             '2020-01-01 01:00:10', '2020-01-01 01:00:15',
             '2020-01-01 01:00:20', '2020-01-01 01:00:25',
             '2020-01-01 01:00:30', '2020-01-01 01:00:35',
             '2020-01-01 01:00:40', '2020-01-01 01:00:45']
values = [1,2,3,4,5,6,7,8,9,10]
df = pd.DataFrame({'values': values}, index=pd.to_datetime(datetimes))
fig, ax = plt.subplots(figsize=(10,4))
ax.scatter(df.index, df['values'])
plt.show()

but it should not matter.

Another detail is that df.values retrieves the underlying Numpy array,
whereas df[‘values’] (as I wrote) retrieves just the column of interest.

The plot I got is quite as expected:

enter image description here

Maybe it is a matter of the version of Pandas and/or Pyplot.
I use Pandas version 1.0.3 and Pyplot version 3.2.1.
If you have older versions, maybe you should upgrade?

Another option: Set manually x axis limits:

plt.xlim(pd.to_datetime('2020-01-01 00:59:55'),
    pd.to_datetime('2020-01-01 01:00:50'))
Answered By: Valdi_Bo

your code runs just fine on my machine (matplotlib 3.2.2 and pandas 1.0.5). what version of matplotlib and pandas you’re in?

try updating your libraries or use this:

ax.set_xlim(df.index[0], df.index[-1])
Answered By: Airlangga Fidiyanto