Keras to predict number based on graph, with no accuracy at all

Question:

I’m new to the neral network world and made an atempt to write an prediction algoritm with tensorflow/keras. This code is just trying to predict an roc depending on the Alt and Temp based on a graph.

(Not able to show the graph here though.)

After a lot of attempts I got some accuracy, about 0.2 to 0.5. Not great but I at leas got something to work with. After a while it dropped to 0 and however I tweak, it dosn’t give me any accuracy at all.
Any idead why I won’t get any accuracy?

#import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import sklearn.model_selection

#Data collection
factor = 10
data = pd.read_csv("roc_6800_ibf.csv", sep=",")
data = data.apply(pd.to_numeric, errors='coerce')
data = (data / factor) + 5

predict = "Roc"

x = np.array(data.drop([predict], axis=1))
y = np.array(data[predict])

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, 
test_size=0.2)

x_shape = int(x.ndim)
y_shape = int(y.ndim)

#Model

model = keras.Sequential([
keras.layers.Dense(units=(2), input_shape=(2,), activation="relu"),
keras.layers.Dense(4, activation="relu"),
keras.layers.Dense(1, activation="relu")
])

model.compile(optimizer="adam", loss="MeanSquaredError", metrics=["accuracy"])

model.fit(x_train, y_train, epochs=20, batch_size=10, verbose=1)

results = model.evaluate(x_test, y_test)

print("- - - - - - - - - - - - - - - - - - - - - - - -")
print(results)

#Prediction

def dataPredict(inputvalues, outputvalues):
    print("- - - - - - - - - - - - - - - - - - - - - - - -")
    test_q = np.array([inputvalues])
    test_a = outputvalues
    prediction = model.predict((test_q / factor) + 5)

    print("Prediction " + str((prediction[0] - 5) * factor))
    print("Actual " + str(test_a[0]))
    print("Input " + str(test_q))


dataPredict([5.5,20.0],[3.6])
dataPredict([6.8,30.0],[0.4])

My indata is about 80 rows from points that I have taken myself from the graph and looks like this. I want to take Alt and Temp to get Roc.

Updated the dataset, 72 rows:

Alt,Temp,Roc
-1.0,-40.0,9.6
0.0,-40.0,9.6
1.0,-40.0,9.6
2.0,-40.0,9.6
3.0,-40.0,9.6
4.0,-40.0,9.6
5.0,-40.0,9.6
6.0,-40.0,9.6
7.0,-40.0,8.1
8.0,-40.0,7.9
7.5,-40.0,9.1
-1.0,0.0,9.6
0.0,0.0,9.6
1.0,0.0,9.6
2.0,0.0,9.6
2.1,0.0,9.6
3.0,0.0,9.0
4.0,0.0,8.0
5.0,0.0,6.6
6.0,0.0,5.5
7.0,0.0,4.2
8.0,0.0,3.2
-1.0,20.0,9.6
0.0,20.0,9.6
0.5,20.0,9.0
1.0,20.0,8.6
2.0,20.0,7.8
3.0,20.0,6.2
4.0,20.0,5.2
5.0,20.0,4.0
6.0,20.0,2.9
7.0,20.0,1.8
8.0,20.0,0.5
-1.0,40.0,7.5
0.0,40.0,6.8
1.0,40.0,5.6
2.0,40.0,4.2
3.0,40.0,3.2
4.0,40.0,2.2
5.0,40.0,1.0
-1.0,50.0,5.4
0.0,50.0,4.2
-0.5,-40.0,9.5
0.5,-40.0,9.5
1.5,-40.0,9.5
2.5,-40.0,9.5
3.5,-40.0,9.5
4.5,-40.0,9.5
5.5,-40.0,9.5
6.5,-40.0,9.1
7.5,-40.0,8.1
-0.5,-10.0,9.5
0.5,-10.0,9.5
1.5,-10.0,9.5
2.5,-10.0,9.5
3.5,-10.0,9.5
4.5,-10.0,8.3
5.5,-10.0,7.1
6.5,-10.0,6.0
7.5,-10.0,5.0
-0.5,30.0,8.4
0.5,30.0,7.6
1.5,30.0,6.4
2.5,30.0,5.5
3.5,30.0,4.2
4.5,30.0,3.1
5.5,30.0,1.9
6.5,30.0,0.8
7.5,30.0,-0.5
5.2,10.0,5.3
6.8,10.0,4.0

I have tried to tweak with the dataset (indata) in the code to make all numbers posetive and devided them by 10, then I got the best resault so far but suddenly it just shot down to 0

Epoch 20/20
6/6 [==============================] - 0s 2ms/step - loss: 32.5049 - accuracy: 0.0000e+00
Asked By: Bengt B

||

Answers:

Alright so I tried implementing some ML on your Dataset (TLDR: XGBoost worked better in this case)

Now that I had a look at the dataset, your accuracy comes 0 as this is a Regression task, and your output is a continuous number, not in the form of [0 or 1]. Hence matching of the predicted output will be almost 0, hence the 0 accuracy. Better way to evaluate these kind of tasks are using different loss functions like MAE, MSE, RMSE, MAPE, and for accuracy you can use R Squared.

Anyway here’s the code:

import pandas as pd
import numpy as np
import seaborn as sns
import collections
import xgboost
from sklearn.linear_model import LinearRegression

df = pd.read_csv("sample_data_1.csv") # Your dataset

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df[['Alt','Temp']], df['Roc'], test_size=0.3)

So first I fitted a Linear Model on your data, because the data entries as well as the complexity seemed pretty simple

lin_model = LinearRegression()
lin_model.fit(x_train, y_train)
preds = lin_model.predict(x_test)

from sklearn.metrics import r2_score
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.6826956688194117'

As you can see, the Linear Model got low accuracy, but now its certain that the inputs are related to the outputs in some fashion.

Next I tried a Keras Model similar to yours, The code is below:

import tensorflow as tf
import tensorflow.keras.layers as layers

model = tf.keras.Sequential([
    layers.Dense(1000, activation = 'relu', input_shape = (2, )),
    layers.Dropout(0.2),
    layers.Dense(500, activation = 'relu'),
    layers.Dropout(0.2),
    layers.Dense(1, activation = 'relu')
])

model.compile(optimizer = 'adam', loss = 'mape', metrics=['mape','mae','mse'])
model.fit(x_train, y_train, epochs = 100, batch_size = 16)
model.evaluate(x_test, y_test)
Output: 1/1 [==============================] - 0s 130ms/step - loss: 53.3907 - mape: 53.3907 - mae: 2.6886 - mse: 15.3293

The results here are really poor as the loss is pretty much 50%, but if you see the Mean Average Error, in magnitude its not a lot.

It means that the model could have performed better if it was scaled down using MinMaxScaler() from scikit-learn’s preprocessing library. (You can try that)

Finally I implemented an XGBoost model, which performed much better than the rest:

xgb_clf = xgboost.XGBRegressor(
    learning_rate=0.3,
    max_depth=6,
    n_estimators=1000
)
xgb_clf.fit(x_train, y_train)
preds = xgb_clf.predict(x_test)
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.8968514145069562'

Almost 90%. And keeping mind the rudimentary state of the data, and minimal preprocessing, the XGBoost model can have a good increase of 5 to 6% in accuracy if proper processing and augmentation is used.

Cheers!

Answered By: Gautam Chettiar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.