How can i Parse my Date Column after getting Nasdaq dataset from yahoofinance in Python
Question:
I got a live data from yahoo finance as follows:
ndx = yf.Ticker("NDX")
# get stock info
print(ndx.info)
# get historical market data
hist = ndx.history(period="1825d")
I downloaded it and Exported to CSV file as follows:
#Download stock data then export as CSV
df = yf.download("NDX", start="2016-01-01", end="2022-11-02")
df.to_csv('ndx.csv')
Viewed the data as follows:
df = pd.read_csv("ndx.csv")
df
The data was displayed as seen in the picture:
THE PROBLEM….
Anytime i tried to use the Date column it throws an error as KeyError ‘Date’. here is my Auto Arima Model and the error thrown. Please Help.
i want to be able to use the Date column. i tried Parsing the Date column but throw the same error. i will need help parsing the data first so as to convert Date to day format or string. Thanks
Answers:
Always great to see people trying to learn financial analysis:
-
Before I get into the solution I just want to remind you to make sure you put your imports in your question (yfinance isn’t always aliased as yf). Also make sure you type or copy/paste your code so that we can easily grab it and run it!
-
So, I am going to assume the variable "orig_df" is just the call to pd.read_csv(‘ndx.csv’) since that’s what the screenshot looks like.
-
Firstly, always check your data types of your columns after reading in the file:
(assuming you are using Jupyter)
orig_df = pd.read_csv(‘ndx.csv’)
orig_df.dtypes
- Date is an object, which just means string in pandas.
- if orig_df is the actual call to yf.ticker(…), then "Date" is your index, so it is does not act like a column.
How to fix and Run:
from statsmodels.api import tsa
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime as dt, timedelta
orig_df = pd.read_csv('ndx.csv', parse_dates=['Date'], index_col=0)
model = tsa.arima.ARIMA(np.log(orig_df['Close']), order=(10, 1, 10))
fitted = model.fit()
fc = fitted.get_forecast(5)
fc = (fc.summary_frame(alpha=0.05))
fc_mean = fc['mean']
fc_lower = fc['mean_ci_lower']
fc_upper = fc['mean_ci_upper']
orig_df.iloc[-50:,:].plot(y='Close', title='Nasdaq 100 Closing price', figsize=(10, 6))
# call orig_df.index[-1] for most recent trading day, not just today
future_5_days = [orig_df.index[-1] + timedelta(days=x) for x in range(5)]
plt.plot(future_5_days, np.exp(fc_mean), label='mean_forecast', linewidth=1.5)
plt.fill_between(future_5_days,
np.exp(fc_lower),
np.exp(fc_upper),
color='b', alpha=.1, label='95% confidence')
plt.title('Nasdaq 5 Days Forecast')
plt.legend(loc='upper left', fontsize=8)
plt.show()
I got a live data from yahoo finance as follows:
ndx = yf.Ticker("NDX")
# get stock info
print(ndx.info)
# get historical market data
hist = ndx.history(period="1825d")
I downloaded it and Exported to CSV file as follows:
#Download stock data then export as CSV
df = yf.download("NDX", start="2016-01-01", end="2022-11-02")
df.to_csv('ndx.csv')
Viewed the data as follows:
df = pd.read_csv("ndx.csv")
df
The data was displayed as seen in the picture:
THE PROBLEM….
Anytime i tried to use the Date column it throws an error as KeyError ‘Date’. here is my Auto Arima Model and the error thrown. Please Help.
i want to be able to use the Date column. i tried Parsing the Date column but throw the same error. i will need help parsing the data first so as to convert Date to day format or string. Thanks
Always great to see people trying to learn financial analysis:
-
Before I get into the solution I just want to remind you to make sure you put your imports in your question (yfinance isn’t always aliased as yf). Also make sure you type or copy/paste your code so that we can easily grab it and run it!
-
So, I am going to assume the variable "orig_df" is just the call to pd.read_csv(‘ndx.csv’) since that’s what the screenshot looks like.
-
Firstly, always check your data types of your columns after reading in the file:
(assuming you are using Jupyter)orig_df = pd.read_csv(‘ndx.csv’)
orig_df.dtypes
- Date is an object, which just means string in pandas.
- if orig_df is the actual call to yf.ticker(…), then "Date" is your index, so it is does not act like a column.
How to fix and Run:
from statsmodels.api import tsa
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime as dt, timedelta
orig_df = pd.read_csv('ndx.csv', parse_dates=['Date'], index_col=0)
model = tsa.arima.ARIMA(np.log(orig_df['Close']), order=(10, 1, 10))
fitted = model.fit()
fc = fitted.get_forecast(5)
fc = (fc.summary_frame(alpha=0.05))
fc_mean = fc['mean']
fc_lower = fc['mean_ci_lower']
fc_upper = fc['mean_ci_upper']
orig_df.iloc[-50:,:].plot(y='Close', title='Nasdaq 100 Closing price', figsize=(10, 6))
# call orig_df.index[-1] for most recent trading day, not just today
future_5_days = [orig_df.index[-1] + timedelta(days=x) for x in range(5)]
plt.plot(future_5_days, np.exp(fc_mean), label='mean_forecast', linewidth=1.5)
plt.fill_between(future_5_days,
np.exp(fc_lower),
np.exp(fc_upper),
color='b', alpha=.1, label='95% confidence')
plt.title('Nasdaq 5 Days Forecast')
plt.legend(loc='upper left', fontsize=8)
plt.show()