# Numpy quirk: Apply function to all pairs of two 1D arrays, to get one 2D array

## Question:

Let’s say I have 2 one-dimensional (1D) numpy arrays, `a` and `b`, with lengths `n1` and `n2` respectively. I also have a function, `F(x,y)`, that takes two values. Now I want to apply that function to each pair of values from my two 1D arrays, so the result would be a 2D numpy array with shape `n1, n2`. The `i, j` element of the two-dimensional array would be `F(a[i], b[j])`.

I haven’t been able to find a way of doing this without a horrible amount of for-loops, and I’m sure there’s a much simpler (and faster!) way of doing this in numpy.

You could use list comprehensions to create an array of arrays:

``````import numpy as np

# Arrays
a = np.array([1, 2, 3]) # n1 = 3
b = np.array([4, 5]) # n2 = 2

# Your function (just an example)
def f(i, j):
return i + j

result = np.array([[f(i, j)for j in b ]for i in a])
print result
``````

Output:

``````[[5 6]
[6 7]
[7 8]]
``````

You can use numpy broadcasting to do calculation on the two arrays, turning `a` into a vertical 2D array using `newaxis`:

``````In [11]: a = np.array([1, 2, 3]) # n1 = 3
...: b = np.array([4, 5]) # n2 = 2
...: #if function is c(i, j) = a(i) + b(j)*2:
...: c = a[:, None] + b*2

In [12]: c
Out[12]:
array([[ 9, 11],
[10, 12],
[11, 13]])
``````

To benchmark:

``````In [28]: a = arange(100)

In [29]: b = arange(222)

In [30]: timeit r = np.array([[f(i, j) for j in b] for i in a])
10 loops, best of 3: 29.9 ms per loop

In [31]: timeit c = a[:, None] + b*2
10000 loops, best of 3: 71.6 us per loop
``````

May I suggest, if your use-case is more limited to products, that you use the outer-product?

e.g.:

``````import numpy

a = array([0, 1, 2])
b = array([0, 1, 2, 3])

numpy.outer(a,b)
``````

returns

``````array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6]])
``````

You can then apply other transformations:

``````numpy.outer(a,b) + 1
``````

returns

``````array([[1, 1, 1, 1],
[1, 2, 3, 4],
[1, 3, 5, 7]])
``````

This is much faster:

``````>>> import timeit
>>> timeit.timeit('numpy.array([[i*j for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.79583477973938

>>> timeit.timeit('numpy.outer(a,b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
9.351550102233887
>>> timeit.timeit('numpy.outer(a,b)+1', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
12.308301210403442
``````

As another alternative that’s a bit more extensible than the dot-product, in less than 1/5th – 1/9th the time of nested list comprehensions, use `numpy.newaxis` (took a bit more digging to find):

``````>>> import numpy
>>> a = numpy.array([0,1,2])
>>> b = numpy.array([0,1,2,3])
``````

This time, using the power function:

``````>>> pow(a[:,numpy.newaxis], b)
array([[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 2, 4, 8]])
``````

Compared with an alternative:

``````>>> numpy.array([[pow(i,j) for j in b] for i in a])
array([[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 2, 4, 8]])
``````

And comparing the timing:

``````>>> import timeit
>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.943181037902832
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
5.985810041427612

>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
109.74687385559082
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
11.989138126373291
``````

If `F()` works with broadcast arguments, definitely use that, as others describe.
An alternative is to use
np.fromfunction
(`function_on_an_int_grid` would be a better name.)
The following just maps the int grid to your a-b grid, then into `F()`:

``````import numpy as np

def func_allpairs( F, a, b ):
""" -> array len(a) x len(b):
[[ F( a0 b0 )  F( a0 b1 ) ... ]
[ F( a1 b0 )  F( a1 b1 ) ... ]
...
]
"""
def fab( i, j ):
return F( a[i], b[j] )  # F scalar or vec, e.g. gradient

return np.fromfunction( fab, (len(a), len(b)), dtype=int )  # -> fab( all pairs )

#...............................................................................
def F( x, y ):
return x + 10*y

a = np.arange( 100 )
b = np.arange( 222 )
A = func_allpairs( F, a, b )
# %timeit: 1000 loops, best of 3: 241 µs per loop -- imac i5, np 1.9.3
``````

If `F` is beyond your control, you can wrap it automatically to be “vector-aware” by using `numpy.vectorize`. I present a working example below where I define my own `F` just for completeness. This approach has the simplicity advantage, but if you have control over `F`, rewriting it with a bit of care to vectorize correctly can have huge speed benefits

``````import numpy

n1 = 100
n2 = 200

a = numpy.arange(n1)
b = numpy.arange(n2)

def F(x, y):
return x + y

# Everything above this is setup, the answer to your question lies here:
fv = numpy.vectorize(F)
r = fv(a[:, numpy.newaxis], b)
``````

On my computer, the following timings are found, showing the price you pay for “automatic” vectorisation:

``````%timeit fv(a[:, numpy.newaxis], b)
100 loops, best of 3: 3.58 ms per loop

%timeit F(a[:, numpy.newaxis], b)
10000 loops, best of 3: 38.3 µs per loop
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.