pandas bar plot combined with line plot shows the time axis beginning at 1970

Question:

I am trying to draw a stock market graph

timeseries vs closing price and timeseries vs volume.

Somehow the x-axis shows the time in 1970

the following is the graph and the code

enter image description here

The code is:

import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.dates as mdates


pd_data = pd.DataFrame(data, columns=['id', 'symbol', 'volume', 'high', 'low', 'open', 'datetime','close','datetime_utc','created_at'])

pd_data['DOB'] = pd.to_datetime(pd_data['datetime_utc']).dt.strftime('%Y-%m-%d') 

pd_data.set_index('DOB')

print(pd_data)

print(pd_data.dtypes)

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

#ax.pd_data['volume'].plot(secondary_y=True,  kind='bar')
ax1=pd_data.plot(y='volume',secondary_y=True, ax=ax,kind='bar')
ax1.set_ylabel('Volumne')


# Choose your xtick format string
date_fmt = '%d-%m-%y'

date_formatter = mdates.DateFormatter(date_fmt)
ax1.xaxis.set_major_formatter(date_formatter)

# set monthly locator
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))

# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()

plt.show()

Also tried the two graphs independently without ax=ax

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')

then price graph shows years properly whereas volumen graph shows 1970

And if i swap them

ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

Now the volume graph shows years properly whereas the price graph shows the years as 1970

I tried removing secondary_y and also changing bar to line. BUt no luck

Somehow pandas Data after first graph is changing the year.

Asked By: Santhosh

||

Answers:

I could not find the reason for 1970, but rather use matplotlib.pyplot to plot instead of indirectly using pandas and also pass the datatime array instead of pandas

So the following code worked

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime as dt
import numpy as np

pd_data = pd.read_csv("/home/stockdata.csv",sep='t')

pd_data['DOB'] = pd.to_datetime(pd_data['datetime2']).dt.strftime('%Y-%m-%d')

dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in pd_data['DOB']]

plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=2))
plt.bar(dates,pd_data['close'],align='center')
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
plt.gcf().autofmt_xdate()
plt.show()

I have created a dates array in the datetime format. If i make graph using that then the dates are no more shown as 1970

open    high    low close   volume  datetime    datetime2
35.12   35.68   34.79   35.58   1432995 1244385200000   2012-6-15 10:30:00
35.69   36.02   35.37   35.78   1754319 1244371600000   2012-6-16 10:30:00
35.69   36.23   35.59   36.23   3685845 1245330800000   2012-6-19 10:30:00
36.11   36.52   36.03   36.32   2635777 1245317200000   2012-6-20 10:30:00
36.54   36.6    35.8    35.9    2886412 1245303600000   2012-6-21 10:30:00
36.03   36.95   36.0    36.09   3696278 1245390000000   2012-6-22 10:30:00
36.5    37.27   36.18   37.11   2732645 1245376400000   2012-6-23 10:30:00
36.98   37.11   36.686  36.83   1948411 1245335600000   2012-6-26 10:30:00
36.67   37.06   36.465  37.05   2557172 1245322000000   2012-6-27 10:30:00
37.06   37.61   36.77   37.52   1780126 1246308400000   2012-6-28 10:30:00
37.47   37.77   37.28   37.7    1352267 1246394800000   2012-6-29 10:30:00
37.72   38.1    37.68   37.76   2194619 1246381200000   2012-6-30 10:30:00

The plot i get is

b

Answered By: Santhosh
  • I do not advise plotting a bar plot with such a numerous amount of bars.
  • This answer explains why there is an issue with the xtick labels, and how to resolve the issue.
  • Plotting with pandas.DataFrame.plot works without issue with .set_major_locator
  • Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import yfinance as yf  # conda install -c conda-forge yfinance or pip install yfinance --upgrade --no-cache-dir

# download data
df = yf.download('amzn', start='2015-02-21', end='2021-04-27')

# plot
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')

ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, alpha=0.5, rot=0, lw=0.5)
ax1.set(ylabel='Volume')

# format
date_fmt = '%d-%m-%y'
years = mdates.YearLocator()   # every year
yearsFmt = mdates.DateFormatter(date_fmt)

ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)

plt.setp(ax.get_xticklabels(), ha="center")
plt.show()

enter image description here


  • Why are the OP x-tick labels starting from 1970?
  • Bar plots locations are being 0 indexed (with pandas), and 0 corresponds to 1970
    • See Pandas bar plot changes date format
    • Most solutions with bar plots simply reformat the label to the appropriate datetime, however this is cosmetic and will not align the locations between the line plot and bar plot
    • Solution 2 of this answer shows how to change the tick locators, but is really not worth the extra code, when plt.bar can be used.
print(pd.to_datetime(ax1.get_xticks()))

DatetimeIndex([          '1970-01-01 00:00:00',
               '1970-01-01 00:00:00.000000001',
               '1970-01-01 00:00:00.000000002',
               '1970-01-01 00:00:00.000000003',
               ...
               '1970-01-01 00:00:00.000001552',
               '1970-01-01 00:00:00.000001553',
               '1970-01-01 00:00:00.000001554',
               '1970-01-01 00:00:00.000001555'],
              dtype='datetime64[ns]', length=1556, freq=None)
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
print(ax.get_xticks())
ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, kind='bar')
print(ax1.get_xticks())
ax1.set_xlim(0, 18628.)

date_fmt = '%d-%m-%y'
years = mdates.YearLocator()   # every year
yearsFmt = mdates.DateFormatter(date_fmt)

ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)

[out]:
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]  ← ax tick locations
[   0    1    2 ... 1553 1554 1555]  ← ax1 tick locations

enter image description here

  • With plt.bar the bar plot locations are indexed based on the datetime
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)', rot=0)
plt.setp(ax.get_xticklabels(), ha="center")
print(ax.get_xticks())

ax1 = ax.twinx()
ax1.bar(df.index, df.Volume)
print(ax1.get_xticks())

date_fmt = '%d-%m-%y'
years = mdates.YearLocator()   # every year
yearsFmt = mdates.DateFormatter(date_fmt)

ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)

[out]:
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]

enter image description here

  • sns.barplot(x=df.index, y=df.Volume, ax=ax1) has xtick locations as [ 0 1 2 ... 1553 1554 1555], so the bar plot and line plot did not align.
Answered By: Trenton McKinney