Error while splitting the DataFrame into Train Set & Test Set


I am self-learning ML & DS. I am getting stuck while trying to split the DaaFrame (dfc).
The following error and the various posts on this site suggest that this error is due to the non-conversion of the DataFrame into an integer. However as much as I know & understand, I have done this step ("split = int(0.80*len(dfc))").

Appreciate if someone can point me in the right direction.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt"seaborn-v0_8")
import warnings
import yfinance as yf
import ta

df ="GOOG")
df = df[["Adj Close"]]

df.columns = ["close"]
df = df.sort_index(ascending=False)

df["returns"] = df['close'].pct_change(1)
df["SMA 15"] = df[["close"]].rolling(15).mean().shift(1)
df["SMA 60"] = df[["close"]].rolling(60).mean().shift(1)
df["MSD 15"] = df[["returns"]].rolling(15).std().shift(1)
df["MSD 60"] = df[["returns"]].rolling(60).std().shift(1)

RSI = ta.momentum.RSIIndicator(df["close"], window=14, fillna=False)
df["rsi"] = RSI.rsi()

dfc =df.columns

 Percentage of Train set
split = int(0.80*len(dfc))

# Train set creation
X_train = dfc[['SMA 15', 'SMA 60', 'MSD 15', 'MSD 30', 'rsi']].iloc[:split] # Fro beginning to split
Y_train = dfc[['returns']].iloc[:split]

# Train set creation
X_test = dfc[['SMA 15', 'SMA 60', 'MSD 15', 'MSD 30', 'rsi']].iloc[split:] # Fro split to end
Y_test = dfc[['returns']].iloc[split:]

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Asked By: Wyatt_Earp



A possible issue could be that you’re not indexing the dataframe df, instead trying to index the columns dfc.

So you could try using df when splitting into train and test like so:

X_train = df[['SMA 15', 'SMA 60', 'MSD 15', 'MSD 30', 'rsi']].iloc[:split]
Answered By: Life Whiz