LSTM Model overfitting or under-fitting?

Question

I am working on an LSTM model that predicts Bitcoin price.
Using: time_steps = 20, epochs = 100, batch_size = 256.

I get the attached Model Loss Plot.MODEL LOSS PLOT

And I also attached the actual vs predicted BTC prices.
Actual VS Predict PLOT

Is this model overfitting or under-fitting…?
THANKS!

import numpy as np
import pandas as pd
import tensorflow as tf
import plotly.graph_objects as go
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Ensure the same results are produced each time
np.random.seed(42)
tf.random.set_seed(42)

# Load the normalized data from a CSV file
df = pd.read_csv('normalized_dataBTC.csv',parse_dates=['Date'], index_col='Date')

# Split the data into training and testing sets
train_size = int(len(df) * 0.8)  # 80% of data for training
train_data = df.iloc[:train_size].values
test_data = df.iloc[train_size:].values

# Define the number of time steps and features for the LSTM model
time_steps = 20  # number of time steps to use for each input sequence
num_features = 6  # number of features in the input data

# Create training sequences for the LSTM model
X_train = []
y_train = []
for i in range(time_steps, train_size):
    X_train.append(train_data[i-time_steps:i, :])
    y_train.append(train_data[i, 4])  # use the "Close" price as the target

# Convert the training data to numpy arrays
X_train = np.array(X_train)
y_train = np.array(y_train)

# Reshape the training data to fit the LSTM model input shape
X_train = np.reshape(X_train, (X_train.shape[0], time_steps, num_features))

# Create testing sequences for the LSTM model
X_test = []
y_test = []
for i in range(time_steps, len(test_data)):
    X_test.append(test_data[i-time_steps:i, :])
    y_test.append(test_data[i, 4])  # use the "Close" price as the target

# Convert the testing data to numpy arrays
X_test = np.array(X_test)
y_test = np.array(y_test)

# Reshape the testing data to fit the LSTM model input shape
X_test = np.reshape(X_test, (X_test.shape[0], time_steps, num_features))

# Create the LSTM model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(units=64, input_shape=(time_steps, num_features)))
model.add(tf.keras.layers.Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Fit the model to the training data
history = model.fit(X_train, y_train, epochs=100, batch_size=256, validation_data=(X_test, y_test))

# Make predictions
predictions = model.predict(X_test)

# Load the original data from a CSV file
df_orig = pd.read_csv('BTC-USD 2014 2023.csv',parse_dates=['Date'], index_col='Date')
# Define the scale factor for the "Close" price
close_scale = df_orig.iloc[train_size:, 4].values.max()
# Un-normalize the predictions
predictions_unscaled = predictions * close_scale

# Plot the actual vs predicted BTC price using Plotly
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_orig.index[train_size+time_steps:], y=y_test*close_scale, name='Actual'))
fig.add_trace(go.Scatter(x=df_orig.index[train_size+time_steps:], y=predictions_unscaled[:,0], name='Predicted'))
fig.update_layout(title='Actual vs Predicted BTC Price', xaxis_title='Date', yaxis_title='Price ($)')
fig.update_layout(title_x=0.5, title_font_size=24, xaxis_title_font_size=18, yaxis_title_font_size=18)
fig.update_xaxes(tickformat='%d/%m/%Y') #Format x-axis as dates
fig.show()

# Plot the training and validation loss over the epochs
plt.plot(history.history['loss'], label='Training loss')
plt.plot(history.history['val_loss'], label='Validation loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

# Calculate the mean squared error
mse = mean_squared_error(y_test, predictions)
print('Mean squared error:', mse)

# Calculate the root mean squared error
rmse = np.sqrt(mse)
print('Root mean squared error:', rmse)

# Calculate the mean absolute error
mae = np.mean(np.abs(y_test - predictions))
print('Mean absolute error:', mae)

PLOTS

Results when trying 30 time steps, 100 epochs, and 256 batch size.

Asked By: Ghita Yunsi

||

Source

Answer 1

Your model doesn’t seem neither to overfit nor to underfit. For both plots, training loss was quite similar to validation loss, and predicted price was similar to the actual price. From your plots, it seems that the 30 timesteps model fits better than the first one.

You could try to use other hyperparameters, for example 40 timesteps, and see if model fits even well.

Answered By: Iya Lee

Answer 2

According to the plot, it seems that you have created a good model, and the overfit and underfit are not visible in the plot. You can make a better model by strengthening the layers and dropout and increasing the epoch value

Answered By: Mehdi sahraei

Answer 3

Your model is basically predicting the previous day’s price. Whether or not it has overfit isn’t really the right question, as it’s basically stuck making a very naive prediction.

You should focus on how to reformulate the problem, such as maybe prediction the price difference for the next day, and plot that out.

Answered By: mpotma

Answer 4

I think if you change the epoch value and the type of epsmize method, you might get better results

Answered By: milad parastoiee

LSTM Model overfitting or under-fitting?

Question:

Answers: