Tensorflow U-Net Multiclass Label

Question:

I’m new to stackoverflow so please apologize any typical newbie mistakes.

I want to set up a CNN with U-Net architecture in Python and Tensorflow. I tried to reuse some code I got which works on binary classification and wanted to adapt it to detect 3 classes. The code I got works great for 2 output layers which has a binary image as label groundtruth.

Now my question is: Are there any conventions how multiclass labels should look like? Should I use an labelimage with only one layer (grayscale) with three values for my different classes (like 0, 127, 255)? Or should I use a rgb image with one colour for every class (like 255, 0, 0 for class 0; 0, 255, 0 for class 1 and so on…)?

""" 0) Creating placeholders for input images and labels """
# Placeholder for input images
x = tf.placeholder(tf.float32, [None, 3*img_size]) # None = arbitrary (Number of images)
# Arrangeing images in 4D format
x_shaped = tf.reshape(x, [-1, img_height, img_width, 3]) # 3 for 3 channels RGB
# Placeholder for labels of input images (ground truth)
y = tf.placeholder(tf.float32, [None, 2*img_size])
# Arrangeing labels in 4D format
y_shaped = tf.reshape(y, [-1, img_size, 2])


""" 1) Defining FCN-8 VGGNet-16 """
network = conv_layer(x_shaped, 64, filter_size=[3, 3], name='conv1a')
network = conv_layer(network, 64, filter_size=[3, 3], name='conv1b')
network = max_pool_layer(network, name='pool1')

network = conv_layer(network, 128, filter_size=[3, 3], name='conv2a')
network = conv_layer(network, 128, filter_size=[3, 3], name='conv2b')
network = max_pool_layer(network, name='pool2')

network = conv_layer(network, 256, filter_size=[3, 3], name='conv3a')
network = conv_layer(network, 256, filter_size=[3, 3], name='conv3b')
network = conv_layer(network, 256, filter_size=[3, 3], name='conv3c')
network = max_pool_layer(network, name='pool3')
net_pool3 = network

network = conv_layer(network, 512, filter_size=[3, 3], name='conv4a')
network = conv_layer(network, 512, filter_size=[3, 3], name='conv4b')
network = conv_layer(network, 512, filter_size=[3, 3], name='conv4c')
network = max_pool_layer(network, name='pool4')
net_pool4 = network

network = conv_layer(network, 512, filter_size=[3, 3], name='conv5a')
network = conv_layer(network, 512, filter_size=[3, 3], name='conv5b')
network = conv_layer(network, 512, filter_size=[3, 3], name='conv5c')
network = max_pool_layer(network, name='pool5')

network = deconv_layer(network, 256, filter_size=[3, 3], name='deconv1')
network = tf.concat([network, net_pool4], 3)
network = conv_layer(network, 256, filter_size=[5, 5], name='conv6')

network = deconv_layer(network, 128, filter_size=[3, 3], name='deconv2')
network = tf.concat([network, net_pool3], 3)
network = conv_layer(network, 128, filter_size=[5, 5], name='conv7')

# in the next lines I would have to change 2 into 3 to get 3 output classes
network = deconv_layer(network, 2, filter_size=[7, 7], strides=[8, 8], name='deconv3')
network = conv_layer(network, 2, filter_size=[7, 7], activation=' ', name='conv8')
y_ = tf.nn.softmax(network)

After computing I generate an output image (in the test phase, after training is completed)

for i in range(rows):
    for j in range(cols):
        for k in range(layers):
            imdata[i*img_height:(i+1)*img_height, j*img_width:(j+1)*img_width, k] = cnn_output[cols*i+j, :, :, k]
imdata = imdata[0:im.height, 0:im.width]
for row in range(real_height):
            for col in range(real_width):
                if(np.amax(imdata[row,col,:]) == imdata[row,col,0]):
                    imdata[row,col,:] = 255, 0, 0
                elif(np.amax(imdata[row,col,:]) == imdata[row,col,1]):
                    imdata[row,col,:] = 0, 255, 0
                else:
                    imdata[row,col,:] = 0, 0, 255
                #img[row][col] = imdata[row][col]
        # Save the image
        scipy.misc.imsave(out_file, imdata)
        im.close()

imdata has the shape of my image with 3 layers (1080, 1920, 3).

Asked By: Cal Blau

||

Answers:

Classification labels are generally a vector where each element represents a class:

class A: [1, 0, 0]
class B: [0, 1, 0]
class C: [0, 0, 1]

The reason is that the output of the your network is a softmax function which will produce a vector of values between 0 and 1. E.g. it can output [0.1, 0.1, 0.8]. The values will always add up to 1, so using softmax assumes that every pixel on the picture can only belong to one class, since increase in the network output for one class will lower the output for other classes.

In a segmentation a class is assigned to every point, so your input is now 3*img_size instead of 2*img_size:

# Placeholder for labels of input images (ground truth)
y = tf.placeholder(tf.float32, [None, 3*img_size])
# Arranging labels in 4D format
y_shaped = tf.reshape(y, [-1, img_size, 3])

For the output:

I assume cnn_output contains the output for only one picture, not for the whole batch.

You need to find out which class has the highest score. In this the np.argmax can help:

class_index = np.argmax(cnn_output, axis=2)

class_index now contains the class number with the highest score. (If cnn_output is only 2 dimensional, set axis to 1.) Next you need to map these values to colors:

colors = {0 : [255, 0, 0], 1 : [0, 255, 0], 2 : [0, 0, 255]}
colored_image = np.array([colors[x] for x in np.nditer(class_index)], 
                         dtype=np.uint8)
output_image = np.reshape(colored_image, (img_height, img_width, 3))

First we created the colored_image which now contains the colors for each point, but is a one dimensional array, so you have to convert it to a 3 dimensional array by np.reshape. You can now draw the output_image:

plt.imshow(output_image)
plt.show()
Answered By: Regic

If I understood your question right, you want to know how your label-image should be for a 3-class problem.

Let’s see how it should be for a two-class problem first. The label-image would consist of just zeros and ones and you would use a binary cross-entropy loss for each pixel and then (maybe) average it over the whole image.

For a n-class problem, your label-image would be of the size of H x W x n where if you take a slice across the entire depth, it would be a one-hot encoded vector. So the vector would have all but one zeros and a single one (corresponding to the class).

Segmentation map

One-hot encoded label-image

Both the images are taken from here. I encourage you to read that blog.

Once you predict your label-image, you could easily convert it by assigning specific colors to labels. For example, in a 2-class segmented image, label 0 => color 0 and label 1 => color 255 – that is a binary image.

For a n-class segmented image, you could get n-equidistant points in the range [0, 0, 0] to [255, 255, 255] and then assign each of these colors to a label. Usually, you could choose such colors manually (e.g. red, green, blue, yellow for 4 classes) but if you want to get really fancy, you could use something like this.

Answered By: Autonomous