Measuring incertainty in Bayesian Neural Network

Question:

Hy everybody,

I’m beginning with tensorflow probability and I have some difficulties to interpret my Bayesian neural network outputs.
I’m working on a regression case, and started with the example provided by tensorflow notebook here: https://blog.tensorflow.org/2019/03/regression-with-probabilistic-layers-in.html?hl=fr

As I seek to know the uncertainty of my network predictions, I dived directly into example 4 with Aleatoric & Epistemic Uncertainty. You can find my code bellow:

def negative_loglikelihood(targets, estimated_distribution):
    return -estimated_distribution.log_prob(targets)


def posterior_mean_field(kernel_size, bias_size, dtype=None):
    n = kernel_size + bias_size #number of total paramaeters (Weights and Bias)
    c = np.log(np.expm1(1.)) 
    return tf.keras.Sequential([
        tfp.layers.VariableLayer(2 * n, dtype=dtype, initializer=lambda shape, dtype: random_gaussian_initializer(shape, dtype), trainable=True), 
        tfp.layers.DistributionLambda(lambda t: tfd.Independent(
            # The Normal distribution with location loc and scale parameters.
            tfd.Normal(loc=t[..., :n],
                       scale=1e-5 +0.01*tf.nn.softplus(c + t[..., n:])),
            reinterpreted_batch_ndims=1)),
    ])



def prior(kernel_size, bias_size, dtype=None):
    n = kernel_size + bias_size
    return tf.keras.Sequential([
        tfp.layers.VariableLayer(n, dtype=dtype),
        tfp.layers.DistributionLambda(lambda t: tfd.Independent(
            tfd.Normal(loc=t, scale=1),
            reinterpreted_batch_ndims=1)),
    ])




def build_model(param):
    model = keras.Sequential()
    for i in range(param["n_layers"] ):
        name="n_units_l"+str(i)
        num_hidden = param[name]
        model.add(tfp.layers.DenseVariational(units=num_hidden, make_prior_fn=prior,make_posterior_fn=posterior_mean_field,kl_weight=1/len(X_train),activation="relu"))
    model.add(tfp.layers.DenseVariational(units=2, make_prior_fn=prior,make_posterior_fn=posterior_mean_field,activation="relu",kl_weight=1/len(X_train))) 
    model.add(tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],scale=1e-3 + tf.math.softplus(0.01 * t[...,1:]))))
    
    lr = param["learning_rate"]
    optimizer=optimizers.Adam(learning_rate=lr)
        
    model.compile(
        loss=negative_loglikelihood,  #negative_loglikelihood, 
        optimizer=optimizer,
        metrics=[keras.metrics.RootMeanSquaredError()],
    )

    return model

I think I have the same network than in tfp example, I just added few hidden layers with differents units. Also I added 0.01 in front of the Softplus in the posterior as suggested here, which allows the network to come up to good performances.
Not able to get reasonable results from DenseVariational

The performances of the model are very good (less than 1% of error) but I have some questions:

  1. As Bayesian neural networks "promise" to mesure the uncertainty of the predictions, I was expecting bigger errors on high variance predictions. I ploted the absolute error versus variance and the results are not good enough on my mind. Of course, the model is better at low variance but I can have really bad predicitions at low variance, and therefore cannot really use standard deviation to filter bad predictions. Why is my Bayesian neural netowrk struggling to give me the uncertainty ?

absolute error versus variance (2000 epochs)

  1. The previous network was train 2000 epochs and we can notice a strange phenome with a vertical bar on lowest stdv. If I increase the number of epoch up to 25000, my results get better either on training and validation set.

loss monitoring

But the phenomene of vertical bar that we may notice on the figure 1 is much more obvious. It seems that as much as I increase the number or EPOCH, all output variance converge to 0.68. Is that a case of overfitting ? Why this value of 0.6931571960449219 and why I can’t get lower stdv ? As the phenome start appearing at 2000 EPOCH, am i already overfitting at 2000 epochs ?

absolute error versus variance (25000 EPOCH)

At this point stdv is totaly useless. So is there a kind of trade off ? With few epochs my model is less performant but gives me some insigh about uncertainty (even if I think they’re not sufficient), where with lot of epochs I have better performances but no more uncertainty informations as all outputs have the same stdv.

Sorry for the long post and the language mistakes.

Thank you in advance for you help and any feed back.

Asked By: maxlamenace

||

Answers:

I solved the problem of why my uncertainty could not get lower than 0.6931571960449219.

Actually this value is converging to log(2). This is due to my relu activation function on my last Dense Variational layer.
Indeed, the scale of tfd.Normal is a softplus (tf.math.softplus).

And softplus is implement like that : softplus(x) = log(exp(x) + 1). As my x doesn’t go in negative values, my minumum incertainty il log(2).

A basic linear activation function solved the problem and my uncertainty has a normal behavior now.

Answered By: maxlamenace