Numpy quirk: Apply function to all pairs of two 1D arrays, to get one 2D array

Question:

Let’s say I have 2 one-dimensional (1D) numpy arrays, a and b, with lengths n1 and n2 respectively. I also have a function, F(x,y), that takes two values. Now I want to apply that function to each pair of values from my two 1D arrays, so the result would be a 2D numpy array with shape n1, n2. The i, j element of the two-dimensional array would be F(a[i], b[j]).

I haven’t been able to find a way of doing this without a horrible amount of for-loops, and I’m sure there’s a much simpler (and faster!) way of doing this in numpy.

Thanks in advance!

Asked By: Misconstruction

||

Answers:

You could use list comprehensions to create an array of arrays:

import numpy as np

# Arrays
a = np.array([1, 2, 3]) # n1 = 3
b = np.array([4, 5]) # n2 = 2

# Your function (just an example)
def f(i, j):
    return i + j

result = np.array([[f(i, j)for j in b ]for i in a])
print result

Output:

[[5 6]
 [6 7]
 [7 8]]
Answered By: Christian Tapia

You can use numpy broadcasting to do calculation on the two arrays, turning a into a vertical 2D array using newaxis:

In [11]: a = np.array([1, 2, 3]) # n1 = 3
    ...: b = np.array([4, 5]) # n2 = 2
    ...: #if function is c(i, j) = a(i) + b(j)*2:
    ...: c = a[:, None] + b*2

In [12]: c
Out[12]: 
array([[ 9, 11],
       [10, 12],
       [11, 13]])

To benchmark:

In [28]: a = arange(100)

In [29]: b = arange(222)

In [30]: timeit r = np.array([[f(i, j) for j in b] for i in a])
10 loops, best of 3: 29.9 ms per loop

In [31]: timeit c = a[:, None] + b*2
10000 loops, best of 3: 71.6 us per loop
Answered By: zhangxaochen

May I suggest, if your use-case is more limited to products, that you use the outer-product?

e.g.:

import numpy

a = array([0, 1, 2])
b = array([0, 1, 2, 3])

numpy.outer(a,b)

returns

array([[0, 0, 0, 0],
       [0, 1, 2, 3],
       [0, 2, 4, 6]])

You can then apply other transformations:

numpy.outer(a,b) + 1

returns

array([[1, 1, 1, 1],
       [1, 2, 3, 4],
       [1, 3, 5, 7]])

This is much faster:

>>> import timeit
>>> timeit.timeit('numpy.array([[i*j for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.79583477973938

>>> timeit.timeit('numpy.outer(a,b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
9.351550102233887
>>> timeit.timeit('numpy.outer(a,b)+1', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
12.308301210403442

As another alternative that’s a bit more extensible than the dot-product, in less than 1/5th – 1/9th the time of nested list comprehensions, use numpy.newaxis (took a bit more digging to find):

>>> import numpy
>>> a = numpy.array([0,1,2])
>>> b = numpy.array([0,1,2,3])

This time, using the power function:

>>> pow(a[:,numpy.newaxis], b)
array([[1, 0, 0, 0],
       [1, 1, 1, 1],
       [1, 2, 4, 8]])

Compared with an alternative:

>>> numpy.array([[pow(i,j) for j in b] for i in a])
array([[1, 0, 0, 0],
       [1, 1, 1, 1],
       [1, 2, 4, 8]])

And comparing the timing:

>>> import timeit
>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.943181037902832
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
5.985810041427612

>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
109.74687385559082
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
11.989138126373291

If F() works with broadcast arguments, definitely use that, as others describe.
An alternative is to use
np.fromfunction
(function_on_an_int_grid would be a better name.)
The following just maps the int grid to your a-b grid, then into F():

import numpy as np

def func_allpairs( F, a, b ):
    """ -> array len(a) x len(b):
        [[ F( a0 b0 )  F( a0 b1 ) ... ]
         [ F( a1 b0 )  F( a1 b1 ) ... ]
         ...
        ]
    """
    def fab( i, j ):
        return F( a[i], b[j] )  # F scalar or vec, e.g. gradient

    return np.fromfunction( fab, (len(a), len(b)), dtype=int )  # -> fab( all pairs )


#...............................................................................
def F( x, y ):
    return x + 10*y

a = np.arange( 100 )
b = np.arange( 222 )
A = func_allpairs( F, a, b )
# %timeit: 1000 loops, best of 3: 241 µs per loop -- imac i5, np 1.9.3
Answered By: denis

If F is beyond your control, you can wrap it automatically to be “vector-aware” by using numpy.vectorize. I present a working example below where I define my own F just for completeness. This approach has the simplicity advantage, but if you have control over F, rewriting it with a bit of care to vectorize correctly can have huge speed benefits

import numpy

n1 = 100
n2 = 200

a = numpy.arange(n1)
b = numpy.arange(n2)

def F(x, y):
    return x + y

# Everything above this is setup, the answer to your question lies here:
fv = numpy.vectorize(F)
r = fv(a[:, numpy.newaxis], b)

On my computer, the following timings are found, showing the price you pay for “automatic” vectorisation:

%timeit fv(a[:, numpy.newaxis], b)
100 loops, best of 3: 3.58 ms per loop

%timeit F(a[:, numpy.newaxis], b)
10000 loops, best of 3: 38.3 µs per loop
Answered By: chthonicdaemon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.