How to replace this naive code with scaled_dot_product_attention() in Pytorch?
How to replace this naive code with scaled_dot_product_attention() in Pytorch? Question: Consider a code fragment from Crossformer: def forward(self, queries, keys, values): B, L, H, E = queries.shape _, S, _, D = values.shape scale = self.scale or 1./sqrt(E) scores = torch.einsum("blhe,bshe->bhls", queries, keys) A = self.dropout(torch.softmax(scale * scores, dim=-1)) V = torch.einsum("bhls,bshd->blhd", A, values) …