I am working on a Neural Network problem, to classify data as 1 or 0. I am using Binary cross entropy loss to do this. The loss is fine, however, the accuracy is very low and isn’t improving. I am assuming I did a mistake in the accuracy calculation. After every epoch, I am calculating the correct predictions after thresholding the output, and dividing that number by the total number of the dataset. Is there any thing wrong I did in the accuracy calculation? And why isn’t it improving, but getting more worse?
This is my code:

net = Model()
criterion = torch.nn.BCELoss(size_average=True)   
optimizer = torch.optim.SGD(net.parameters(), lr=0.1)

num_epochs = 100
for epoch in range(num_epochs):
    for i, (inputs,labels) in enumerate (train_loader):
        inputs = Variable(inputs.float())
        labels = Variable(labels.float())
        output = net(inputs)
        loss = criterion(output, labels)

    output = (output>0.5).float()
    correct = (output == labels).float().sum()
    print("Epoch {}/{}, Loss: {:.3f}, Accuracy: {:.3f}".format(epoch+1,num_epochs,[0], correct/x.shape[0]))

And this is the strange output I get:

Epoch 1/100, Loss: 0.389, Accuracy: 0.035
Epoch 2/100, Loss: 0.370, Accuracy: 0.036
Epoch 3/100, Loss: 0.514, Accuracy: 0.030
Epoch 4/100, Loss: 0.539, Accuracy: 0.030
Epoch 5/100, Loss: 0.583, Accuracy: 0.029
Epoch 6/100, Loss: 0.439, Accuracy: 0.031
Epoch 7/100, Loss: 0.429, Accuracy: 0.034
Epoch 8/100, Loss: 0.408, Accuracy: 0.035
Epoch 9/100, Loss: 0.316, Accuracy: 0.035
Epoch 10/100, Loss: 0.436, Accuracy: 0.035
Epoch 11/100, Loss: 0.365, Accuracy: 0.034
Epoch 12/100, Loss: 0.485, Accuracy: 0.031
Epoch 13/100, Loss: 0.392, Accuracy: 0.033
Epoch 14/100, Loss: 0.494, Accuracy: 0.030
Epoch 15/100, Loss: 0.369, Accuracy: 0.035
Epoch 16/100, Loss: 0.495, Accuracy: 0.029
Epoch 17/100, Loss: 0.415, Accuracy: 0.034
Epoch 18/100, Loss: 0.410, Accuracy: 0.035
Epoch 19/100, Loss: 0.282, Accuracy: 0.038
Epoch 20/100, Loss: 0.499, Accuracy: 0.031
Epoch 21/100, Loss: 0.446, Accuracy: 0.030
Epoch 22/100, Loss: 0.585, Accuracy: 0.026
Epoch 23/100, Loss: 0.419, Accuracy: 0.035
Epoch 24/100, Loss: 0.492, Accuracy: 0.031
Epoch 25/100, Loss: 0.537, Accuracy: 0.031
Epoch 26/100, Loss: 0.439, Accuracy: 0.033
Epoch 27/100, Loss: 0.421, Accuracy: 0.035
Epoch 28/100, Loss: 0.532, Accuracy: 0.034
Epoch 29/100, Loss: 0.234, Accuracy: 0.038
Epoch 30/100, Loss: 0.492, Accuracy: 0.027
Epoch 31/100, Loss: 0.407, Accuracy: 0.035
Epoch 32/100, Loss: 0.305, Accuracy: 0.038
Epoch 33/100, Loss: 0.663, Accuracy: 0.025
Epoch 34/100, Loss: 0.588, Accuracy: 0.031
Epoch 35/100, Loss: 0.329, Accuracy: 0.035
Epoch 36/100, Loss: 0.474, Accuracy: 0.033
Epoch 37/100, Loss: 0.535, Accuracy: 0.031
Epoch 38/100, Loss: 0.406, Accuracy: 0.033
Epoch 39/100, Loss: 0.513, Accuracy: 0.030
Epoch 40/100, Loss: 0.593, Accuracy: 0.030
Epoch 41/100, Loss: 0.265, Accuracy: 0.036
Epoch 42/100, Loss: 0.576, Accuracy: 0.031
Epoch 43/100, Loss: 0.565, Accuracy: 0.027
Epoch 44/100, Loss: 0.576, Accuracy: 0.030
Epoch 45/100, Loss: 0.396, Accuracy: 0.035
Epoch 46/100, Loss: 0.423, Accuracy: 0.034
Epoch 47/100, Loss: 0.489, Accuracy: 0.033
Epoch 48/100, Loss: 0.591, Accuracy: 0.029
Epoch 49/100, Loss: 0.415, Accuracy: 0.034
Epoch 50/100, Loss: 0.291, Accuracy: 0.039
Epoch 51/100, Loss: 0.395, Accuracy: 0.033
Epoch 52/100, Loss: 0.540, Accuracy: 0.026
Epoch 53/100, Loss: 0.436, Accuracy: 0.033
Epoch 54/100, Loss: 0.346, Accuracy: 0.036
Epoch 55/100, Loss: 0.519, Accuracy: 0.029
Epoch 56/100, Loss: 0.456, Accuracy: 0.031
Epoch 57/100, Loss: 0.425, Accuracy: 0.035
Epoch 58/100, Loss: 0.311, Accuracy: 0.039
Epoch 59/100, Loss: 0.406, Accuracy: 0.034
Epoch 60/100, Loss: 0.360, Accuracy: 0.035
Epoch 61/100, Loss: 0.476, Accuracy: 0.030
Epoch 62/100, Loss: 0.404, Accuracy: 0.034
Epoch 63/100, Loss: 0.382, Accuracy: 0.036
Epoch 64/100, Loss: 0.538, Accuracy: 0.031
Epoch 65/100, Loss: 0.392, Accuracy: 0.034
Epoch 66/100, Loss: 0.434, Accuracy: 0.033
Epoch 67/100, Loss: 0.479, Accuracy: 0.031
Epoch 68/100, Loss: 0.494, Accuracy: 0.031
Epoch 69/100, Loss: 0.415, Accuracy: 0.034
Epoch 70/100, Loss: 0.390, Accuracy: 0.036
Epoch 71/100, Loss: 0.330, Accuracy: 0.038
Epoch 72/100, Loss: 0.449, Accuracy: 0.030
Epoch 73/100, Loss: 0.315, Accuracy: 0.039
Epoch 74/100, Loss: 0.450, Accuracy: 0.031
Epoch 75/100, Loss: 0.562, Accuracy: 0.030
Epoch 76/100, Loss: 0.447, Accuracy: 0.031
Epoch 77/100, Loss: 0.408, Accuracy: 0.038
Epoch 78/100, Loss: 0.359, Accuracy: 0.034
Epoch 79/100, Loss: 0.372, Accuracy: 0.035
Epoch 80/100, Loss: 0.452, Accuracy: 0.034
Epoch 81/100, Loss: 0.360, Accuracy: 0.035
Epoch 82/100, Loss: 0.453, Accuracy: 0.031
Epoch 83/100, Loss: 0.578, Accuracy: 0.030
Epoch 84/100, Loss: 0.537, Accuracy: 0.030
Epoch 85/100, Loss: 0.483, Accuracy: 0.035
Epoch 86/100, Loss: 0.343, Accuracy: 0.036
Epoch 87/100, Loss: 0.439, Accuracy: 0.034
Epoch 88/100, Loss: 0.686, Accuracy: 0.023
Epoch 89/100, Loss: 0.265, Accuracy: 0.039
Epoch 90/100, Loss: 0.369, Accuracy: 0.035
Epoch 91/100, Loss: 0.521, Accuracy: 0.027
Epoch 92/100, Loss: 0.662, Accuracy: 0.027
Epoch 93/100, Loss: 0.581, Accuracy: 0.029
Epoch 94/100, Loss: 0.322, Accuracy: 0.034
Epoch 95/100, Loss: 0.375, Accuracy: 0.035
Epoch 96/100, Loss: 0.575, Accuracy: 0.031
Epoch 97/100, Loss: 0.489, Accuracy: 0.030
Epoch 98/100, Loss: 0.435, Accuracy: 0.033
Epoch 99/100, Loss: 0.440, Accuracy: 0.031
Epoch 100/100, Loss: 0.444, Accuracy: 0.033
Is x the entire input dataset? If so, you might be dividing by the size of the entire input dataset in correct/x.shape[0] (as opposed to the size of the mini-batch). Try changing this to correct/output.shape[0]

A better way would be calculating correct right after optimization step

for epoch in range(num_epochs):

    correct = 0
    for i, (inputs,labels) in enumerate (train_loader):
        output = net(inputs)

        correct += (output == labels).float().sum()

    accuracy = 100 * correct / len(trainset)
    # trainset, not train_loader
    # probably x in your case

    print("Accuracy = {}".format(accuracy))
Here is my solution:

def evaluate(model, validation_loader, use_cuda=True):
    with torch.no_grad():
        acc = .0
        for i, data in enumerate(validation_loader):
            X = data[0]
            y = data[1]
            if use_cuda:
                X = X.cuda()
                y = y.cuda()
            predicted = model(X)
            acc+=(predicted.round() == y).sum()/float(predicted.shape[0])       
    return (acc/(i+1)).detach().item()

Note 1: Set the model to eval mode while validating and then back to train mode.

Note 2: I’m not sure if autograd needs to be disabled. Here is a thread on it

For one-hot results torch.max can be used. Example:

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))
Just read this answer:


I think the simplest answer is the one from the cifar10 tutorial:

total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))


acc = (true == pred).sum().item()

If you have a counter don’t forget to eventually divide by the size of the data-set or analogous values.

I’ve used:

N = data.size(0) # since usually it's size (batch_size, D1, D2, ...)
correct += (1/N) * correct

Self contained code:

# testing accuracy function

import torch
import torch.nn as nn

D = 1
true = torch.tensor([0,1,0,1,1]).reshape(5,1)
print(f'true.size() = {true.size()}')

batch_size = true.size(0)
print(f'batch_size = {batch_size}')
x = torch.randn(batch_size,D)
print(f'x = {x}')
print(f'x.size() = {x.size()}')

mdl = nn.Linear(D,1)
logit = mdl(x)
_, pred = torch.max(, 1)

print(f'logit = {logit}')

print(f'pred = {pred}')
print(f'true = {true}')

acc = (true == pred).sum().item()
print(f'acc = {acc}')

Also, I find this code to be good reference:

def calc_accuracy(mdl, X, Y):
    # reduce/collapse the classification dimension according to max op
    # resulting in most likely label
    max_vals, max_indices = mdl(X).max(1)
    # assumes the first dimension is batch size
    n = max_indices.size(0)  # index 0 for extracting the # of elements
    # calulate acc (note .item() to do float division)
    acc = (max_indices == Y).sum().item() / n
    return acc

Explaining pred = mdl(x).max(1)see this

the main thing is that you have to reduce/collapse the dimension where the classification raw value/logit is with a max and then select it with a .indices. Usually this is dimensions 1 since dim 0 has the batch size e.g. [batch_size,D_classification] where the raw data might of size [batch_size,C,H,W]

A synthetic example with raw data in 1D as follows:

import torch
import torch.nn as nn

# data dimension [batch-size, D]
D, Dout = 1, 5
batch_size = 16
x = torch.randn(batch_size, D)
y = torch.randint(low=0,high=Dout,size=(batch_size,))

mdl = nn.Linear(D, Dout)
logits = mdl(x)
print(f'y.size() = {y.size()}')
# removes the 1th dimension with a max, which is the classification layer
# which means it returns the most likely label. Also, note you need to choose .indices since you want to return the
# position of where the most likely label is (not it's raw logit value)
pred = logits.max(1).indices

print('--- preds vs truth ---')
print(f'predictions = {pred}')
print(f'y = {y}')

acc = (pred == y).sum().item() / pred.size(0)


y.size() = torch.Size([16])
tensor([3, 1, 1, 3, 4, 1, 4, 3, 1, 1, 4, 4, 4, 4, 3, 1])
--- preds vs truth ---
predictions = tensor([3, 1, 1, 3, 4, 1, 4, 3, 1, 1, 4, 4, 4, 4, 3, 1])
y = tensor([3, 3, 3, 0, 3, 4, 0, 1, 1, 2, 1, 4, 4, 2, 0, 0])


Lets look at the basics :

  Accuracy = Total Correct Observations / Total Observations

In your code when you are calculating the accuracy you are dividing Total Correct Observations in one epoch by total observations which is incorrect


Instead you should divide it by number of observations in each epoch i.e. batch size. Suppose your batch size = batch_size

Solution 1. Accuracy = correct/batch_size
Solution 2. Accuracy = correct/len(labels)
Solution 3. Accuracy = correct/len(input)

Ideally at every epoch, your batch size, length of input (number of rows) and length of labels should be same.

one liner to get accuracy

acc == (true == mdl(x).max(1).item() / true.size(0)

assuming 0th dimension is the batch size and 1st dimension hold the logits/raw values for classification labels.

More details:

def calc_error(mdl: torch.nn.Module, X: torch.Tensor, Y):
    # acc == (true != mdl(x).max(1).item() / true.size(0)
    train_acc = calc_accuracy(mdl, X, Y)
    train_err = 1.0 - train_acc
    return train_err

def calc_accuracy(mdl: torch.nn.Module, X: torch.Tensor, Y: torch.Tensor) -> float:
    Get the accuracy with respect to the most likely label

    :param mdl:
    :param X:
    :param Y:
    # get the scores for each class (or logits)
    y_logits = mdl(X)  # unnormalized probs
    # return the values & indices with the largest value in the dimension where the scores for each class is
    # get the scores with largest values & their corresponding idx (so the class that is most likely)
    max_scores, max_idx_class = mdl(X).max(dim=1)  # [B, n_classes] -> [B], # get values & indices with the max vals in the dim with scores for each class/label
    # usually 0th coordinate is batch size
    n = X.size(0)
    assert( n == max_idx_class.size(0))
    # calulate acc (note .item() to do float division)
    acc = (max_idx_class == Y).sum().item() / n
    return acc
Here check these definitions:

def train(model, train_loader):
    train_acc, correct_train, train_loss, target_count = 0, 0, 0, 0
    for i, (input, target) in enumerate(train_loader):
        target = target.cuda()
        input_var = Variable(input)
        target_var = Variable(target)

        output = model(input_var)
        loss = criterion(output, target_var)

        # accuracy
        _, predicted = torch.max(, 1)
        target_count += target_var.size(0)
        correct_train += (target_var == predicted).sum().item()
        train_acc = (100 * correct_train) / target_count
    return train_acc, train_loss / target_count

def validate(model, val_loader):
    val_acc, correct_val, val_loss, target_count = 0, 0, 0, 0
    for i, (input, target) in enumerate(val_loader):
        target = target.cuda()
        input_var = Variable(input, volatile=True)
        target_var = Variable(target, volatile=True)
        output = model(input_var)
        loss = criterion(output, target_var)
        val_loss += loss.item()

        # accuracy
        _, predicted = torch.max(, 1)
        target_count += target_var.size(0)
        correct_val += (target_var == predicted).sum().item()
        val_acc = 100 * correct_val / target_count
    return (val_acc * 100) / target_count, val_loss / target_count                            

for epoch in range(0, n_epoch):
    train_acc, train_loss = train(model, train_loader)
    val_loss = validate(model, val_loader)
    print("Epoch {0}: train_acc {1} t train_loss {2} t val_acc {3} t val_loss {4}".format(epoch, train_acc, train_loss, val_acc, val_loss))
Just read this answer:

Step by step example

Here is a step by step explanation with self contained code as an example:


# refs:

# how to get the class prediction

batch_size = 4
n_classes = 2
y_logits = torch.randn(batch_size, n_classes)  # usually the scores
print('scores (logits) for each class for each example in batch (how likely a class is unnormalized)')
print('the max over entire tensor (not usually what we want)')
print('the max over the n_classes dim. For each example in batch returns: '
      '1) the highest score for each class (most likely class)n, and '
      '2) the idx (=class) with that highest score')

print('-- calculate accuracy --')

# computing accuracy in pytorch
random.choice(a, size=None, replace=True, p=None)
Generates a random sample from a given 1-D array

for pytorch random choice

import torch
import torch.nn as nn

in_features = 1
n_classes = 10
batch_size = n_classes

mdl = nn.Linear(in_features=in_features, out_features=n_classes)

x = torch.randn(batch_size, in_features)
y_logits = mdl(x)  # scores/logits for each example in batch [B, n_classes]
# get for each example in batch the label/idx most likely according to score
# y_max_idx[b] = y_pred[b] = argmax_{idx in [n_classes]} y_logit[idx]
y_max_scores, y_max_idx = y_logits.max(dim=1)
y_pred = y_max_idx  # predictions are really the inx in [n_classes] with the highest scores
y = torch.randint(high=n_classes, size=(batch_size,))
# accuracy for 1 batch
assert (y.size(0) == batch_size)
acc = (y == y_pred).sum() / y.size(0)
acc = acc.item()



scores (logits) for each class for each example in batch (how likely a class is unnormalized)
tensor([[ 0.4912,  1.5143],
        [ 1.2378,  0.3172],
        [-1.0164, -1.2786],
        [-1.6685, -0.6693]])
the max over entire tensor (not usually what we want)
the max over the n_classes dim. For each example in batch returns: 1) the highest score for each class (most likely class)
, and 2) the idx (=class) with that highest score
values=tensor([ 1.5143,  1.2378, -1.0164, -0.6693]),
indices=tensor([1, 0, 0, 1]))
-- calculate accuracy --
tensor([6, 1, 3, 5, 3, 9, 6, 5, 6, 6])
tensor([5, 5, 5, 5, 5, 7, 7, 5, 5, 7])
If you need output like these

2022-08-08 12:26:48,472 Epoch [20/20], Step [144/148], Loss: 0.1878 Accuracy: 92.1875
2022-08-08 12:26:48,597 Epoch [20/20], Step [145/148], Loss: 0.1052 Accuracy: 96.8750
2022-08-08 12:26:48,723 Epoch [20/20], Step [146/148], Loss: 0.2459 Accuracy: 90.6250
2022-08-08 12:26:48,848 Epoch [20/20], Step [147/148], Loss: 0.1617 Accuracy: 95.3125
2022-08-08 12:26:48,970 Epoch [20/20], Step [148/148], Loss: 0.1481 Accuracy: 95.0820
2022-08-08 12:26:49,055 --->Epoch [20/20], Average Loss: 0.1596 Average Accuracy: 94.3924
Accuracy of the network on the 3925 test images: 69.98726114649682 %
Accuracy of the network on the 9469 Train images: 94.31830182701447 %

Code like below –

# loop over our epochs
for epoch in range(0, num_epochs):
    # set the model in training mode
    # initialize the total training and validation loss
    totalTrainLoss = 0
    totalValLoss = 0
    # initialize the number of correct predictions in the training
    # and validation step
    trainAccuracy = 0
    totalTrainAccuracy = 0
    valCorrect = 0
    # loop over the training set
    for i, (images, labels) in enumerate(train_loader):
        # send the input to the device
        (images, labels) = (,
        # perform a forward pass and calculate the training loss
        outputs = model(images)
        loss = lossFn(outputs, labels)
        # zero out the gradients, perform the backpropagation step,
        # and update the weights
        totalTrainLoss += loss
        # Get the predicted values
        _, predicted = torch.max(, 1)
        trainAccuracy = (predicted == labels).float().sum().item()
        trainAccuracy = 100 * trainAccuracy / labels.size(0)
        totalTrainAccuracy += trainAccuracy
        # if (i // stepsize) % 10 == 0:
            "Epoch [{}/{}], Step [{}/{}], Loss: {:.4f} Accuracy: {:.4f}".format(
                epoch + 1, num_epochs, i + 1, total_step, loss, trainAccuracy

    avgTrainLoss = totalTrainLoss / len(train_loader)
    avgAccuracy = totalTrainAccuracy / len(train_loader)
        "--->Epoch [{}/{}], Average Loss: {:.4f} Average Accuracy: {:.4f}".format(
            epoch + 1, num_epochs, avgTrainLoss, avgAccuracy

To validate, code like below

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images =
        labels =
        outputs = model(images)
        _, predicted = torch.max(, 1)
        total += labels.size(0)
        correct += (predicted == labels).float().sum().item()

        "Accuracy of the network on the {} test images: {} %".format(
            total, 100 * correct / total

    correct = 0
    total = 0
    for images, labels in train_loader:
        images =
        labels =
        outputs = model(images)
        _, predicted = torch.max(, 1)
        total += labels.size(0)
        correct += (predicted == labels).float().sum().item()

        "Accuracy of the network on the {} Train images: {} %".format(
            total, 100 * correct / total

Full code here

You can use ACCURACY in the TorchMetrics library.

from torchmetrics import Accuracy

accuracy = Accuracy()
accuracy(output, labels)
