NumPy: Dot product of diagonal elements

Question:

Consider x, an n x 3 vector.

  1. Is it possible, using built-in methods of numpy or tensorflow, or any Python library, to get a vector of the order n x 1 such that each row is a vector of the order 3 x 1? That is, if x is [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]T, can a vector of the form [[1, 2, 3]T, [4, 5, 6]T, [7, 8, 9]T, [10, 11, 12]T]T be got without for loops or introducing new axes like, say, np.newaxis?
  2. The motive behind this is to get only the diagonal elements of the dot product of x and its transpose. We could, of course, do something like np.diag(x.dot(x.T)). But, if n is significantly large, say, 202933, one can hear the CPU’s fan suffering from wheezing. How to actually avoid doing the dot product of all the elements and do so of only the diagonal ones of the phantom dot product without iteration?
Asked By: Rusty Shackelford

||

Answers:

Let’s take a look at the formula for each element in the result of multiplying x by its own transpose. I don’t feel like trying to coerce the Stack Overflow UI into allowing me to use tensor notation, so we’ll look conceptually.

Each element at row i, column j of the result is the dot product of row i in x and column j in x.T. Now column j in x.T is just row j in x, and the diagonal is where i and j are the same. So what you want is a sum across the rows of the squared elements of x:

d = (x * x).sum(axis=1)

To address the first part of your question, the transpose operation in numpy rarely makes a copy of your data, so x.T or np.transpose(x) are constant-time operations for even the largest arrays. The reason is that numpy arrays are stored as a block of data along with some meta-data like dimensions, strides between elements in each dimension, and data size. Transposing an array only requires you to modify a small amount of meta-data in the array object, like sizes along each dimension and strides, not copy the whole data set.

The time consuming part is performing the multiplication. Simply having the objects x and x.T costs almost nothing: they both use the same data buffer.

Answered By: Mad Physicist

This function is likely one of the most efficient ways to handle this. (Taken from trimesh: https://github.com/mikedh/trimesh/blob/main/trimesh/util.py#L589)

def diagonal_dot(a, b):
"""
Dot product by row of a and b.

There are a lot of ways to do this though
performance varies very widely. This method
uses a dot product to sum the row and avoids
function calls if at all possible.

Parameters
------------
a : (m, d) float
  First array
b : (m, d) float
  Second array

Returns
-------------
result : (m,) float
  Dot product of each row
"""
# make sure `a` is numpy array
# doing it for `a` will force the multiplication to
# convert `b` if necessary and avoid function call otherwise
a = np.asanyarray(a)
# 3x faster than (a * b).sum(axis=1)
# avoiding np.ones saves 5-10% sometimes
return np.dot(a * b, [1.0] * a.shape[1])

Comparing performance of some equivalent versions:

In [1]: import numpy as np; import trimesh

In [2]: a = np.random.random((10000, 3))

In [3]: b = np.random.random((10000, 3))

In [4]: %timeit (a * b).sum(axis=1)
1000 loops, best of 3: 181 us per loop

In [5]: %timeit np.einsum('ij,ij->i', a, b)
10000 loops, best of 3: 62.7 us per loop

In [6]: %timeit np.diag(np.dot(a, b.T))
1 loop, best of 3: 429 ms per loop

In [7]: %timeit np.dot(a * b, np.ones(a.shape[1]))
10000 loops, best of 3: 61.3 us per loop

In [8]: %timeit trimesh.util.diagonal_dot(a, b)
10000 loops, best of 3: 55.2 us per loop
Answered By: LC117