Keras to predict number based on graph, with no accuracy at all
Question:
I’m new to the neral network world and made an atempt to write an prediction algoritm with tensorflow/keras. This code is just trying to predict an roc depending on the Alt and Temp based on a graph.
(Not able to show the graph here though.)
After a lot of attempts I got some accuracy, about 0.2 to 0.5. Not great but I at leas got something to work with. After a while it dropped to 0 and however I tweak, it dosn’t give me any accuracy at all.
Any idead why I won’t get any accuracy?
#import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import sklearn.model_selection
#Data collection
factor = 10
data = pd.read_csv("roc_6800_ibf.csv", sep=",")
data = data.apply(pd.to_numeric, errors='coerce')
data = (data / factor) + 5
predict = "Roc"
x = np.array(data.drop([predict], axis=1))
y = np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y,
test_size=0.2)
x_shape = int(x.ndim)
y_shape = int(y.ndim)
#Model
model = keras.Sequential([
keras.layers.Dense(units=(2), input_shape=(2,), activation="relu"),
keras.layers.Dense(4, activation="relu"),
keras.layers.Dense(1, activation="relu")
])
model.compile(optimizer="adam", loss="MeanSquaredError", metrics=["accuracy"])
model.fit(x_train, y_train, epochs=20, batch_size=10, verbose=1)
results = model.evaluate(x_test, y_test)
print("- - - - - - - - - - - - - - - - - - - - - - - -")
print(results)
#Prediction
def dataPredict(inputvalues, outputvalues):
print("- - - - - - - - - - - - - - - - - - - - - - - -")
test_q = np.array([inputvalues])
test_a = outputvalues
prediction = model.predict((test_q / factor) + 5)
print("Prediction " + str((prediction[0] - 5) * factor))
print("Actual " + str(test_a[0]))
print("Input " + str(test_q))
dataPredict([5.5,20.0],[3.6])
dataPredict([6.8,30.0],[0.4])
My indata is about 80 rows from points that I have taken myself from the graph and looks like this. I want to take Alt and Temp to get Roc.
Updated the dataset, 72 rows:
Alt,Temp,Roc
-1.0,-40.0,9.6
0.0,-40.0,9.6
1.0,-40.0,9.6
2.0,-40.0,9.6
3.0,-40.0,9.6
4.0,-40.0,9.6
5.0,-40.0,9.6
6.0,-40.0,9.6
7.0,-40.0,8.1
8.0,-40.0,7.9
7.5,-40.0,9.1
-1.0,0.0,9.6
0.0,0.0,9.6
1.0,0.0,9.6
2.0,0.0,9.6
2.1,0.0,9.6
3.0,0.0,9.0
4.0,0.0,8.0
5.0,0.0,6.6
6.0,0.0,5.5
7.0,0.0,4.2
8.0,0.0,3.2
-1.0,20.0,9.6
0.0,20.0,9.6
0.5,20.0,9.0
1.0,20.0,8.6
2.0,20.0,7.8
3.0,20.0,6.2
4.0,20.0,5.2
5.0,20.0,4.0
6.0,20.0,2.9
7.0,20.0,1.8
8.0,20.0,0.5
-1.0,40.0,7.5
0.0,40.0,6.8
1.0,40.0,5.6
2.0,40.0,4.2
3.0,40.0,3.2
4.0,40.0,2.2
5.0,40.0,1.0
-1.0,50.0,5.4
0.0,50.0,4.2
-0.5,-40.0,9.5
0.5,-40.0,9.5
1.5,-40.0,9.5
2.5,-40.0,9.5
3.5,-40.0,9.5
4.5,-40.0,9.5
5.5,-40.0,9.5
6.5,-40.0,9.1
7.5,-40.0,8.1
-0.5,-10.0,9.5
0.5,-10.0,9.5
1.5,-10.0,9.5
2.5,-10.0,9.5
3.5,-10.0,9.5
4.5,-10.0,8.3
5.5,-10.0,7.1
6.5,-10.0,6.0
7.5,-10.0,5.0
-0.5,30.0,8.4
0.5,30.0,7.6
1.5,30.0,6.4
2.5,30.0,5.5
3.5,30.0,4.2
4.5,30.0,3.1
5.5,30.0,1.9
6.5,30.0,0.8
7.5,30.0,-0.5
5.2,10.0,5.3
6.8,10.0,4.0
I have tried to tweak with the dataset (indata) in the code to make all numbers posetive and devided them by 10, then I got the best resault so far but suddenly it just shot down to 0
Epoch 20/20
6/6 [==============================] - 0s 2ms/step - loss: 32.5049 - accuracy: 0.0000e+00
Answers:
Alright so I tried implementing some ML on your Dataset (TLDR: XGBoost worked better in this case)
Now that I had a look at the dataset, your accuracy comes 0 as this is a Regression task, and your output is a continuous number, not in the form of [0 or 1]. Hence matching of the predicted output will be almost 0, hence the 0 accuracy. Better way to evaluate these kind of tasks are using different loss functions like MAE, MSE, RMSE, MAPE, and for accuracy you can use R Squared.
Anyway here’s the code:
import pandas as pd
import numpy as np
import seaborn as sns
import collections
import xgboost
from sklearn.linear_model import LinearRegression
df = pd.read_csv("sample_data_1.csv") # Your dataset
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df[['Alt','Temp']], df['Roc'], test_size=0.3)
So first I fitted a Linear Model on your data, because the data entries as well as the complexity seemed pretty simple
lin_model = LinearRegression()
lin_model.fit(x_train, y_train)
preds = lin_model.predict(x_test)
from sklearn.metrics import r2_score
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.6826956688194117'
As you can see, the Linear Model got low accuracy, but now its certain that the inputs are related to the outputs in some fashion.
Next I tried a Keras Model similar to yours, The code is below:
import tensorflow as tf
import tensorflow.keras.layers as layers
model = tf.keras.Sequential([
layers.Dense(1000, activation = 'relu', input_shape = (2, )),
layers.Dropout(0.2),
layers.Dense(500, activation = 'relu'),
layers.Dropout(0.2),
layers.Dense(1, activation = 'relu')
])
model.compile(optimizer = 'adam', loss = 'mape', metrics=['mape','mae','mse'])
model.fit(x_train, y_train, epochs = 100, batch_size = 16)
model.evaluate(x_test, y_test)
Output: 1/1 [==============================] - 0s 130ms/step - loss: 53.3907 - mape: 53.3907 - mae: 2.6886 - mse: 15.3293
The results here are really poor as the loss is pretty much 50%, but if you see the Mean Average Error, in magnitude its not a lot.
It means that the model could have performed better if it was scaled down using MinMaxScaler() from scikit-learn’s preprocessing library. (You can try that)
Finally I implemented an XGBoost model, which performed much better than the rest:
xgb_clf = xgboost.XGBRegressor(
learning_rate=0.3,
max_depth=6,
n_estimators=1000
)
xgb_clf.fit(x_train, y_train)
preds = xgb_clf.predict(x_test)
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.8968514145069562'
Almost 90%. And keeping mind the rudimentary state of the data, and minimal preprocessing, the XGBoost model can have a good increase of 5 to 6% in accuracy if proper processing and augmentation is used.
Cheers!
I’m new to the neral network world and made an atempt to write an prediction algoritm with tensorflow/keras. This code is just trying to predict an roc depending on the Alt and Temp based on a graph.
(Not able to show the graph here though.)
After a lot of attempts I got some accuracy, about 0.2 to 0.5. Not great but I at leas got something to work with. After a while it dropped to 0 and however I tweak, it dosn’t give me any accuracy at all.
Any idead why I won’t get any accuracy?
#import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import sklearn.model_selection
#Data collection
factor = 10
data = pd.read_csv("roc_6800_ibf.csv", sep=",")
data = data.apply(pd.to_numeric, errors='coerce')
data = (data / factor) + 5
predict = "Roc"
x = np.array(data.drop([predict], axis=1))
y = np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y,
test_size=0.2)
x_shape = int(x.ndim)
y_shape = int(y.ndim)
#Model
model = keras.Sequential([
keras.layers.Dense(units=(2), input_shape=(2,), activation="relu"),
keras.layers.Dense(4, activation="relu"),
keras.layers.Dense(1, activation="relu")
])
model.compile(optimizer="adam", loss="MeanSquaredError", metrics=["accuracy"])
model.fit(x_train, y_train, epochs=20, batch_size=10, verbose=1)
results = model.evaluate(x_test, y_test)
print("- - - - - - - - - - - - - - - - - - - - - - - -")
print(results)
#Prediction
def dataPredict(inputvalues, outputvalues):
print("- - - - - - - - - - - - - - - - - - - - - - - -")
test_q = np.array([inputvalues])
test_a = outputvalues
prediction = model.predict((test_q / factor) + 5)
print("Prediction " + str((prediction[0] - 5) * factor))
print("Actual " + str(test_a[0]))
print("Input " + str(test_q))
dataPredict([5.5,20.0],[3.6])
dataPredict([6.8,30.0],[0.4])
My indata is about 80 rows from points that I have taken myself from the graph and looks like this. I want to take Alt and Temp to get Roc.
Updated the dataset, 72 rows:
Alt,Temp,Roc
-1.0,-40.0,9.6
0.0,-40.0,9.6
1.0,-40.0,9.6
2.0,-40.0,9.6
3.0,-40.0,9.6
4.0,-40.0,9.6
5.0,-40.0,9.6
6.0,-40.0,9.6
7.0,-40.0,8.1
8.0,-40.0,7.9
7.5,-40.0,9.1
-1.0,0.0,9.6
0.0,0.0,9.6
1.0,0.0,9.6
2.0,0.0,9.6
2.1,0.0,9.6
3.0,0.0,9.0
4.0,0.0,8.0
5.0,0.0,6.6
6.0,0.0,5.5
7.0,0.0,4.2
8.0,0.0,3.2
-1.0,20.0,9.6
0.0,20.0,9.6
0.5,20.0,9.0
1.0,20.0,8.6
2.0,20.0,7.8
3.0,20.0,6.2
4.0,20.0,5.2
5.0,20.0,4.0
6.0,20.0,2.9
7.0,20.0,1.8
8.0,20.0,0.5
-1.0,40.0,7.5
0.0,40.0,6.8
1.0,40.0,5.6
2.0,40.0,4.2
3.0,40.0,3.2
4.0,40.0,2.2
5.0,40.0,1.0
-1.0,50.0,5.4
0.0,50.0,4.2
-0.5,-40.0,9.5
0.5,-40.0,9.5
1.5,-40.0,9.5
2.5,-40.0,9.5
3.5,-40.0,9.5
4.5,-40.0,9.5
5.5,-40.0,9.5
6.5,-40.0,9.1
7.5,-40.0,8.1
-0.5,-10.0,9.5
0.5,-10.0,9.5
1.5,-10.0,9.5
2.5,-10.0,9.5
3.5,-10.0,9.5
4.5,-10.0,8.3
5.5,-10.0,7.1
6.5,-10.0,6.0
7.5,-10.0,5.0
-0.5,30.0,8.4
0.5,30.0,7.6
1.5,30.0,6.4
2.5,30.0,5.5
3.5,30.0,4.2
4.5,30.0,3.1
5.5,30.0,1.9
6.5,30.0,0.8
7.5,30.0,-0.5
5.2,10.0,5.3
6.8,10.0,4.0
I have tried to tweak with the dataset (indata) in the code to make all numbers posetive and devided them by 10, then I got the best resault so far but suddenly it just shot down to 0
Epoch 20/20
6/6 [==============================] - 0s 2ms/step - loss: 32.5049 - accuracy: 0.0000e+00
Alright so I tried implementing some ML on your Dataset (TLDR: XGBoost worked better in this case)
Now that I had a look at the dataset, your accuracy comes 0 as this is a Regression task, and your output is a continuous number, not in the form of [0 or 1]. Hence matching of the predicted output will be almost 0, hence the 0 accuracy. Better way to evaluate these kind of tasks are using different loss functions like MAE, MSE, RMSE, MAPE, and for accuracy you can use R Squared.
Anyway here’s the code:
import pandas as pd
import numpy as np
import seaborn as sns
import collections
import xgboost
from sklearn.linear_model import LinearRegression
df = pd.read_csv("sample_data_1.csv") # Your dataset
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df[['Alt','Temp']], df['Roc'], test_size=0.3)
So first I fitted a Linear Model on your data, because the data entries as well as the complexity seemed pretty simple
lin_model = LinearRegression()
lin_model.fit(x_train, y_train)
preds = lin_model.predict(x_test)
from sklearn.metrics import r2_score
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.6826956688194117'
As you can see, the Linear Model got low accuracy, but now its certain that the inputs are related to the outputs in some fashion.
Next I tried a Keras Model similar to yours, The code is below:
import tensorflow as tf
import tensorflow.keras.layers as layers
model = tf.keras.Sequential([
layers.Dense(1000, activation = 'relu', input_shape = (2, )),
layers.Dropout(0.2),
layers.Dense(500, activation = 'relu'),
layers.Dropout(0.2),
layers.Dense(1, activation = 'relu')
])
model.compile(optimizer = 'adam', loss = 'mape', metrics=['mape','mae','mse'])
model.fit(x_train, y_train, epochs = 100, batch_size = 16)
model.evaluate(x_test, y_test)
Output: 1/1 [==============================] - 0s 130ms/step - loss: 53.3907 - mape: 53.3907 - mae: 2.6886 - mse: 15.3293
The results here are really poor as the loss is pretty much 50%, but if you see the Mean Average Error, in magnitude its not a lot.
It means that the model could have performed better if it was scaled down using MinMaxScaler() from scikit-learn’s preprocessing library. (You can try that)
Finally I implemented an XGBoost model, which performed much better than the rest:
xgb_clf = xgboost.XGBRegressor(
learning_rate=0.3,
max_depth=6,
n_estimators=1000
)
xgb_clf.fit(x_train, y_train)
preds = xgb_clf.predict(x_test)
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.8968514145069562'
Almost 90%. And keeping mind the rudimentary state of the data, and minimal preprocessing, the XGBoost model can have a good increase of 5 to 6% in accuracy if proper processing and augmentation is used.
Cheers!