How to make formula differentiable for a binary classifier in PyTorch

Question:

I am trying to create a custom loss function for a binary classifier case. I need the binary predictions as an input to the function. However, I am getting to a point where I am unable to create a the process differentiable.
I get the raw output from the model which has autograd attached to it. It is as follows.

outputs = tensor([[-0.1908,  0.4115],
                  [-1.0019, -0.1685],
                  [-1.1265, -0.3025],
                  [-0.5925, -0.6610],
                  [-0.4076, -0.4897],
                  [-0.6450, -0.2863],
                  [ 0.1632,  0.4944],
                  [-1.0743,  0.1003],
                  [ 0.6172,  0.5104],
                  [-0.2296, -0.0551],
                  [-1.3165,  0.3386],
                  [ 0.2705,  0.1200],
                  [-1.3767, -0.6496],
                  [-0.5603,  1.0609],
                  [-0.0109,  0.5767],
                  [-1.1081,  0.8886]], grad_fn=<AddmmBackward0>)

Then I take the predictions from it using;

_, preds = torch.max(outputs, 1)

However, when taking a look at the preds variable, the grad function is gone;

preds = tensor([0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0])

#labels
labels:  tensor([0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1])

The preds variable goes as input to the custom loss function.
My question is; Is there a way that I could get the preds variable with the autograd attached to it. So that it could be differentiated.
I get a warning when I manually attach autograd to the preds variable.

#Custom loss function
def pfbeta_torch(preds, labels, beta=1.3):
    #labels = torch.tensor(labels.clone().detach(), dtype=torch.float64, requires_grad=True)
    preds = torch.tensor(preds.clone(), dtype=torch.float64, requires_grad=True)
    pTP = torch.sum(labels * preds)
    pFP = torch.sum((1 - labels) * preds)
    num_positives = torch.sum(labels)  #  = pTP+pFN

    pPrecision = pTP / (pTP + pFP)
    pRecall = pTP / num_positives

    beta_squared = beta ** 2
    # x=0
    if (pPrecision > 0 and pRecall > 0):
        pF1 = (1 + beta_squared) * pPrecision * pRecall / (beta_squared * pPrecision + pRecall)
        return pF1
    else:
        return torch.tensor(0, dtype=torch.float64, requires_grad=True)


#Warning
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  This is separate from the ipykernel package so we can avoid doing imports until

EDIT:

I used the following function below as per new info.

def pfbeta_torch(outputs, labels, beta=1.3):
    logits = F.softmax(outputs, dim=-1)
    outputs = F.gumbel_softmax(logits, tau=1, hard=True)
    pTP = torch.sum(labels * outputs[:,1])
    pFP = torch.sum((1 - labels) * outputs[:,1])
    num_positives = torch.sum(labels)  #  = pTP+pFN

    pPrecision = pTP / (pTP + pFP)
    pRecall = pTP / num_positives

    beta_squared = beta ** 2
    # x=0
    if (pPrecision > 0 and pRecall > 0):
        pF1 = (1 + beta_squared) * pPrecision * pRecall / (beta_squared * pPrecision + pRecall)
        return pF1
    else:
        return torch.tensor(0, dtype=torch.float64, requires_grad=True)

Printing the loss value:

tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.5000, grad_fn=<DivBackward0>)
tensor(0.5000, grad_fn=<DivBackward0>)
tensor(0.4432, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.4432, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.7610, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.8433, grad_fn=<DivBackward0>)
tensor(0.5000, grad_fn=<DivBackward0>)
tensor(0.6142, grad_fn=<DivBackward0>)
tensor(0.4432, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.6142, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.6667, grad_fn=<DivBackward0>)
tensor(1., grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)
tensor(0.4432, grad_fn=<DivBackward0>)
tensor(0., dtype=torch.float64, requires_grad=True)

Accuracy value is hopeless at 0.45 even after several epochs and it stays there. With cross entropy loss (nn.CrossEntropyLoss()) it gives 94% accuracy. Hence, I believe there that the custom loss function is not properly done.

Would anyone be able to help me in this regards please.
Thanks & Best Regards
AMJS

Answers:

Max has a derivative of 0 everywhere except at the transition point where it is undefined. For this reason implementing what you’re asking for is impossible. That said, there are tricks to work around. If you’re fine with the outputs being relaxed you can use ‘Preds = outputs.softmax(dim=1)’. Based on your example code, it seems you’re implementing something close to the Jaccard index and this is the approach I would suggest. If you really need them to be discrete you can use hard Gumbel-softmax or straight through estimators, but those are rather advanced topics and I’d recommend against it unless you know what you’re doing.

Answered By: Jatentaki