Tensorflow model predicting Nans

Question:

I am new to TensorFlow framework and I am trying to apply Tensorflow to predict the survivor based on this Titanic Dataset:https://www.kaggle.com/c/titanic/data.

import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
#%%
titanictrain = pd.read_csv('train.csv')
titanictest = pd.read_csv('test.csv')
df = pd.concat([titanictrain,titanictest],join='outer',keys='PassengerId',sort=False,ignore_index=True).drop(['Name'],1)

#%%

def preprocess(df):
    df['Fare'].fillna(value=df.groupby('Pclass')['Fare'].transform('median'),inplace=True)
    df['Fare'] = df['Fare'].map(lambda x: np.log(x) if x>0 else 0)
    df['Embarked'].fillna(value=df['Embarked'].mode()[0],inplace=True)
    df['CabinAlphabet'] = df['Cabin'].str[0]
    categories_to_one_hot = ['Pclass','Sex','Embarked','CabinAlphabet']
    df = pd.get_dummies(df,columns=categories_to_one_hot,drop_first=True)
    return df
df = preprocess(df)

df = df.drop(['PassengerId','Ticket','Cabin','Survived'],1)

titanic_trainandval = df.iloc[:len(titanictrain)]
titanic_test = df.iloc[len(titanictrain):] #test after preprocessing

titanic_test.head()

#  split train into training and validation set
labels = titanictrain['Survived']
y = labels.values

test = titanic_test.copy() # real test sets
print(len(test), 'test examples')

Here I am trying to apply preprocessing on the data:

1.Drop Name column and Do one hot coding both on the train and test set

2.Drop [‘PassengerId’,’Ticket’,’Cabin’,’Survived’] for Simplicity.

  1. Split train and test following the original order

enter image description here
Here is a picture showing what the training set looks like.

"""# model training"""

from tensorflow.keras.layers import Input, Dense, Activation,Dropout
from tensorflow.keras.models import Model

X = titanic_trainandval.copy()
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(10, activation='relu')(input_layer)
dense_layer_2 = Dense(5, activation='relu')(dense_layer_1)
output = Dense(1, activation='softmax',name = 'predictions')(dense_layer_2)

model = Model(inputs=input_layer, outputs=output)
base_learning_rate = 0.0001

model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate), metrics=['acc'])

history = model.fit(X, y, batch_size=5, epochs=20, verbose=2, validation_split=0.1,shuffle = False)

submission = pd.DataFrame()
submission['PassengerId'] = titanictest['PassengerId']

Then I put the training set X into the model to get the result. However, history shows the following result:
enter image description here

No matter how I change the learning rate and batch size, the result does not change, and the loss is always ‘nan’, and the prediction based on the test set is always ‘nan’ as well.

Could anybody explain where the problem is and give some possible solutions?

Answers:

at first glance there are 2 major problems in your code:

  1. your output layer must be Dense(2, activation='softmax'). this is because yours is a binary classification problem and if you are using softmax to generate probabilities the output dim must be equal to the number of classes. (you can use one output dimension with sigmoid activation)

  2. you have to change your loss function. with softmax and numerical encoded target use sparse_categorical_crossentropy. (you can use binary_crossentropy with sigmoid and with from_logits=False as default)

PS: make sure to remove all NaNs in your original data before starting fit

Answered By: Marco Cerliani

Marco Cerliani is right with the points 1 and 2.

The real problem why you have NaNs is because you feed NaNs in your code. If you look carefully, even in your third photo, the 888th example on the column Age contains a NaN.

This is why you have NaNs. Solve this one, and apply Marco Cerliani’s suggestions and you’re good to go.

Answered By: Timbus Calin

Apart from the above answers, 1 more thing which I would like to add is that whenever you want to use form_logits=True for classification problems, use Linear activation function i.e activation='linear' which is the default value for activation function in the last layer.

Answered By: Ahmad Anis
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.