tensorflow deep neural network for regression always predict same results in one batch

Question:

I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to MSE (use tf.reduce_mean(tf.square(pred-y))), and some input, output size settings. However, if I train the network using regression, after several epochs, the output batch are totally the same. for example:

target: 48.129, estimated: 42.634
target: 46.590, estimated: 42.634
target: 34.209, estimated: 42.634
target: 69.677, estimated: 42.634
......

I have tried different batch size, different initialization, input normalization using sklearn.preprocessing.scale (my inputs range are quite different). However, none of them worked. I have also tried one of sklearn example from Tensorflow (Deep Neural Network Regression with Boston Data). But I got another error in line 40:

‘module’ object has no attribute ‘infer_real_valued_columns_from_input’

Anyone has clues on where the problem is? Thank you

My code is listed below, may be a little bit long, but very straghtforward:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import learn
import matplotlib.pyplot as plt

from sklearn.pipeline import Pipeline
from sklearn import datasets, linear_model
from sklearn import cross_validation
import numpy as np

boston = learn.datasets.load_dataset('boston')
x, y = boston.data, boston.target
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(
x, y, test_size=0.2, random_state=42)

total_len = X_train.shape[0]

# Parameters
learning_rate = 0.001
training_epochs = 500
batch_size = 10
display_step = 1
dropout_rate = 0.9
# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 200 # 2nd layer number of features
n_hidden_3 = 200
n_hidden_4 = 256
n_input = X_train.shape[1]
n_classes = 1

# tf Graph input
x = tf.placeholder("float", [None, 13])
y = tf.placeholder("float", [None])

# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)

    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    # Hidden layer with RELU activation
    layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
    layer_3 = tf.nn.relu(layer_3)

    # Hidden layer with RELU activation
    layer_4 = tf.add(tf.matmul(layer_3, weights['h4']), biases['b4'])
    layer_4 = tf.nn.relu(layer_4)

    # Output layer with linear activation
    out_layer = tf.matmul(layer_4, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], 0, 0.1)),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], 0, 0.1)),
    'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3], 0, 0.1)),
    'h4': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_4], 0, 0.1)),
    'out': tf.Variable(tf.random_normal([n_hidden_4, n_classes], 0, 0.1))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1], 0, 0.1)),
    'b2': tf.Variable(tf.random_normal([n_hidden_2], 0, 0.1)),
    'b3': tf.Variable(tf.random_normal([n_hidden_3], 0, 0.1)),
    'b4': tf.Variable(tf.random_normal([n_hidden_4], 0, 0.1)),
    'out': tf.Variable(tf.random_normal([n_classes], 0, 0.1))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(total_len/batch_size)
        # Loop over all batches
        for i in range(total_batch-1):
            batch_x = X_train[i*batch_size:(i+1)*batch_size]
            batch_y = Y_train[i*batch_size:(i+1)*batch_size]
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c, p = sess.run([optimizer, cost, pred], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch

        # sample prediction
        label_value = batch_y
        estimate = p
        err = label_value-estimate
        print ("num batch:", total_batch)

        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", 
                "{:.9f}".format(avg_cost))
            print ("[*]----------------------------")
            for i in xrange(3):
                print ("label value:", label_value[i], 
                    "estimated value:", estimate[i])
            print ("[*]============================")

    print ("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print ("Accuracy:", accuracy.eval({x: X_test, y: Y_test}))
Asked By: Sufeng Niu

||

Answers:

There is likely a problem with your dataset loading or indexing implementation. If you only modified the cost to MSE, make sure pred and y are correctly being updated and you did not overwrite them with a different graph operation.

Another thing to help debug would be to predict the actual regression outputs. It would also help if you posted more of your code so we can see your specific data loading implementation, etc.

Answered By: ahaque

Short answer:

Transpose your pred vector using tf.transpose(pred).

Longer answer:

The problem is that pred (the predictions) and y (the labels) are not of the same shape: one is a row vector and the other a column vector. Apparently when you apply an element-wise operation on them, you’ll get a matrix, which is not what you want.

The solution is to transpose the prediction vector using tf.transpose() to get a proper vector and thus a proper loss function. Actually, if you set the batch size to 1 in your example you’ll see that it works even without the fix, because transposing a 1×1 vector is a no-op.

I applied this fix to your example code and observed the following behaviour. Before the fix:

Epoch: 0245 cost= 84.743440580
[*]----------------------------
label value: 23 estimated value: [ 27.47437096]
label value: 50 estimated value: [ 24.71126747]
label value: 22 estimated value: [ 23.87785912]

And after the fix at the same point in time:

Epoch: 0245 cost= 4.181439120
[*]----------------------------
label value: 23 estimated value: [ 21.64333534]
label value: 50 estimated value: [ 48.76105118]
label value: 22 estimated value: [ 24.27996063]

You’ll see that the cost is much lower and that it actually learned the value 50 properly. You’ll have to do some fine-tuning on the learning rate and such to improve your results of course.

Answered By: CNugteren