Why does unet have classes?

Question:

import torch
import torch.nn as nn
import torch.nn.functional as F


class double_conv(nn.Module):
    '''(conv => BN => ReLU) * 2'''
    def __init__(self, in_ch, out_ch):
        super(double_conv, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x = self.conv(x)
        return x


class inconv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(inconv, self).__init__()
        self.conv = double_conv(in_ch, out_ch)

    def forward(self, x):
        x = self.conv(x)
        return x


class down(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(down, self).__init__()
        self.mpconv = nn.Sequential(
            nn.MaxPool2d(2),
            double_conv(in_ch, out_ch)
        )

    def forward(self, x):
        x = self.mpconv(x)
        return x


class up(nn.Module):
    def __init__(self, in_ch, out_ch, bilinear=True):
        super(up, self).__init__()

        #  would be a nice idea if the upsampling could be learned too,
        #  but my machine do not have enough memory to handle all those weights
        if bilinear:
            self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
        else:
            self.up = nn.ConvTranspose2d(in_ch//2, in_ch//2, 2, stride=2)

        self.conv = double_conv(in_ch, out_ch)

    def forward(self, x1, x2):
        x1 = self.up(x1)
        diffX = x1.size()[2] - x2.size()[2]
        diffY = x1.size()[3] - x2.size()[3]
        x2 = F.pad(x2, (diffX // 2, int(diffX / 2),
                        diffY // 2, int(diffY / 2)))
        x = torch.cat([x2, x1], dim=1)
        x = self.conv(x)
        return x


class outconv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(outconv, self).__init__()
        self.conv = nn.Conv2d(in_ch, out_ch, 1)

    def forward(self, x):
        x = self.conv(x)
        return x


class UNet(nn.Module):
    def __init__(self, n_channels, n_classes):
        super(UNet, self).__init__()
        self.inc = inconv(n_channels, 64)
        self.down1 = down(64, 128)
        self.down2 = down(128, 256)
        self.down3 = down(256, 512)
        self.down4 = down(512, 512)
        self.up1 = up(1024, 256)
        self.up2 = up(512, 128)
        self.up3 = up(256, 64)
        self.up4 = up(128, 64)
        self.outc = outconv(64, n_classes)

    def forward(self, x):
        self.x1 = self.inc(x)
        self.x2 = self.down1(self.x1)
        self.x3 = self.down2(self.x2)
        self.x4 = self.down3(self.x3)
        self.x5 = self.down4(self.x4)
        self.x6 = self.up1(self.x5, self.x4)
        self.x7 = self.up2(self.x6, self.x3)
        self.x8 = self.up3(self.x7, self.x2)
        self.x9 = self.up4(self.x8, self.x1)
        self.y = self.outc(self.x9)
        return self.y

When I was reading UNet architecture I have found that it has n_classes as output.

class UNet(nn.Module):
    def __init__(self, n_channels, n_classes):

but why does it have n_classes as it is used for image segmentation?

I am trying to use this code for image denoising and I couldn’t figure out what will should the n_classes parameter be, because I don’t have any classes.

Does n_classes signify multiclass segmentation? If so, what is the output of binary UNet segmentation?

Asked By: user15957418

||

Answers:

Answer

Does n_classes signify multiclass segmentation?

Yes, if you specify n_classes=4 it will output a (batch, 4, width, height) shaped tensor, where each pixel can be segmented as one of 4 classes. Also one should use torch.nn.CrossEntropyLoss for training.

If so, what is the output of binary UNet segmentation?

If you want to use binary segmentation you’d specify n_classes=1 (either 0 for black or 1 for white) and use torch.nn.BCEWithLogitsLoss

I am trying to use this code for image denoising and I couldn’t figure out what will should the n_classes parameter be

It should be equal to n_channels, usually 3 for RGB or 1 for grayscale. If you want to teach this model to denoise an image you should:

  • Add some noise to the image (e.g. using torchvision.transforms)
  • Use sigmoid activation at the end as the pixels will have value between 0 and 1 (unless normalized)
  • Use torch.nn.MSELoss for training

Why sigmoid?

Because [0,255] pixel range is represented as [0, 1] pixel value (without normalization at least). sigmoid does exactly that – squashes value into [0, 1] range, hence linear outputs (logits) can have a range from -inf to +inf.

Why not a linear output and a clamp?

In order for the Linear layer to be in [0, 1] range after clamp possible output values from Linear would have to be greater than 0 (logits range to fit the target: [0, +inf])

Why not a linear output without a clamp?

Logits outputted would have to be within [0, 1] range

Why not some other method?

You could do that, but the idea of sigmoid is:

  • help neural network (any logit value can be outputted)
  • first derivative of sigmoid is gaussian standard normal, hence it models the probability of many real-life occurring phenomena (see also here for more)
Answered By: Szymon Maszke