NaN from tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))

Question:

I am doing a image segmentation using resnet50 as encoder and made the decoder with unpooling layers with skip layers in tensorflow

Here is the model structure,

For the loss function, I used the dice_coefficient and IOU formula, and calculated the total loss by adding both. In addition to the total loss, I added the REGULARIZATION_LOSSES from the network

total_loss = tf.add_n([dice_coefficient_output+IOU_output]+tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))

Training started, In the 1st epoch, the total loss will be around 0.4
But, in the 2nd epoch, the total loss is shown as nan it

After decoding the loss values, the tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) have the list of values for each layer, there, in most of the layers returns nan.

For this challenge, I tried using different normalisation like scale image data to = 0 to 1, -1 to 1, z-score, but the nan appears in the 2nd epoch.

I tried to reduce the learning rate, changed the weight decay in the l2 regularization, but the nan stays same from 2nd epoch.

Finally, I reduced the neurons in the network, and started the training, the nan disappeared in the 2nd epoch but appeared in the 4th epoch.

Any suggestion to improve this model, how to get rid of the nan in the regularization_loss

Thanks

Asked By: Vishak Raj

Source

Answers:

Two possible solutions:

You may have an issue with the input data. Try calling assert not
np.any(np.isnan(x)) on the input data to make sure you are not introducing
the nan. Also make sure all of the target values are valid. Finally, make
sure the data is properly normalized. You probably want to have the pixels
in the range [-1, 1] and not [0, 255], example:

tf.keras.utils.normalize(data)

Other related options to the above would be that usually, the gradients
become NaN first. The first two things to look at are a reduced learning
rate and possibly gradient clipping.

Alternatively, you can try dividing by some constant first (perhaps equal
to the max value of your data?) The idea is to get the values low enough
that they don’t cause really large gradients.

The labels must be in the domain of the loss function, so if using a logarithmic-based loss function all labels must be non-negative.

There are lots of things I have seen make a model diverge.

Too high of a learning rate. You can often tell if this is the case if the loss begins to increase and then diverges to infinity.

I am guessing your classifier uses the categorical cross entropy cost function. This involves taking the log of the prediction which diverges as the prediction approaches zero. That is why people usually add a small epsilon value to the prediction to prevent this divergence. I am guessing the RESNET probably does this or uses the tensorflow opp for it. Probably not the issue.

Other numerical stability issues can exist such as division by zero where adding the epsilon can help. Another less obvious one if the square root whose derivative can diverge if not properly simplified when dealing with finite precision numbers. Yet again I doubt this is the issue in the case of the classifier.

You may have an issue with the input data. Try calling assert not np.any(np.isnan(x)) on the input data to make sure you are not introducing the nan. Also make sure all of the target values are valid. Finally, make sure the data is properly normalized. You probably want to have the pixels in the range [-1, 1] and not [0, 255].

Otherwise see this link: https://discuss.pytorch.org/t/getting-nan-after-first-iteration-with-custom-loss/25929/7

Understanding domain adaptation for labels that must be within the domain of the loss function:

Answered By: joe hoeller