What is running loss in PyTorch and how is it calculated

Question:

I had a look at this tutorial in the PyTorch docs for understanding Transfer Learning. There was one line that I failed to understand.

After the loss is calculated using loss = criterion(outputs, labels), the running loss is calculated using running_loss += loss.item() * inputs.size(0) and finally, the epoch loss is calculated using running_loss / dataset_sizes[phase].

Isn’t loss.item() supposed to be for an entire mini-batch (please correct me if I am wrong). i.e, if the batch_size is 4, loss.item() would give the loss for the entire set of 4 images. If this is true, why is loss.item() being multiplied with inputs.size(0) while calculating running_loss? Isn’t this step like an extra multiplication in this case?

Any help would be appreciated. Thanks!

Asked By: Jitesh Malipeddi

||

Answers:

It’s because the loss given by CrossEntropy or other loss functions is divided by the number of elements i.e. the reduction parameter is mean by default.

torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

Hence, loss.item() contains the loss of entire mini-batch, but divided by the batch size. That’s why loss.item() is multiplied with batch size, given by inputs.size(0), while calculating running_loss.

Answered By: kHarshit

if the batch_size is 4, loss.item() would give the loss for the entire set of 4 images

That depends on how the loss is calculated. Remember, loss is a tensor just like every other tensor. In general the PyTorch APIs return avg loss by default

“The losses are averaged across observations for each minibatch.”

t.item() for a tensor t simply converts it to python’s default float32.

More importantly, if you are new to PyTorch, it might be helpful for you to know that we use t.item() to maintain running loss instead of t because PyTorch tensors store history of its values which might overload your GPU very soon.

Answered By: Piyush Singh