How can i Parse my Date Column after getting Nasdaq dataset from yahoofinance in Python

Question:

I got a live data from yahoo finance as follows:

ndx = yf.Ticker("NDX")

# get stock info

print(ndx.info)

# get historical market data
hist = ndx.history(period="1825d")

I downloaded it and Exported to CSV file as follows:
 #Download stock data then export as CSV

df = yf.download("NDX", start="2016-01-01", end="2022-11-02")
df.to_csv('ndx.csv')

Viewed the data as follows:
df = pd.read_csv("ndx.csv")
df

The data was displayed as seen in the picture:
enter image description here

THE PROBLEM….
Anytime i tried to use the Date column it throws an error as KeyError ‘Date’. here is my Auto Arima Model and the error thrown. Please Help.
enter image description here

ERROR THROWN
enter image description here

enter image description here

i want to be able to use the Date column. i tried Parsing the Date column but throw the same error. i will need help parsing the data first so as to convert Date to day format or string. Thanks

Asked By: Emy

||

Answers:

Always great to see people trying to learn financial analysis:

  • Before I get into the solution I just want to remind you to make sure you put your imports in your question (yfinance isn’t always aliased as yf). Also make sure you type or copy/paste your code so that we can easily grab it and run it!

  • So, I am going to assume the variable "orig_df" is just the call to pd.read_csv(‘ndx.csv’) since that’s what the screenshot looks like.

  • Firstly, always check your data types of your columns after reading in the file:
    (assuming you are using Jupyter)

    orig_df = pd.read_csv(‘ndx.csv’)
    orig_df.dtypes

enter image description here

  • Date is an object, which just means string in pandas.
  • if orig_df is the actual call to yf.ticker(…), then "Date" is your index, so it is does not act like a column.

How to fix and Run:

from statsmodels.api import tsa
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime as dt, timedelta    

orig_df = pd.read_csv('ndx.csv', parse_dates=['Date'], index_col=0)

model = tsa.arima.ARIMA(np.log(orig_df['Close']), order=(10, 1, 10))
fitted = model.fit()
fc = fitted.get_forecast(5)
fc = (fc.summary_frame(alpha=0.05))
fc_mean = fc['mean']
fc_lower = fc['mean_ci_lower']
fc_upper = fc['mean_ci_upper']

orig_df.iloc[-50:,:].plot(y='Close', title='Nasdaq 100 Closing price', figsize=(10, 6))

enter image description here

# call orig_df.index[-1] for most recent trading day, not just today
future_5_days = [orig_df.index[-1] + timedelta(days=x) for x in range(5)]
plt.plot(future_5_days, np.exp(fc_mean), label='mean_forecast', linewidth=1.5)
plt.fill_between(future_5_days, 
                 np.exp(fc_lower), 
                 np.exp(fc_upper), 
                 color='b', alpha=.1, label='95% confidence')
plt.title('Nasdaq 5 Days Forecast')
plt.legend(loc='upper left', fontsize=8)
plt.show()

enter image description here

Answered By: finman69
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.