Using NumPy to build an array of all combinations of two arrays

Question:

I’m trying to run over the parameters space of a six-parameter function to study its numerical behavior before trying to do anything complex with it, so I’m searching for an efficient way to do this.

My function takes float values given in a 6-dim NumPy array as input. What I tried to do initially was this:

First, I created a function that takes two arrays and generate an array with all combinations of values from the two arrays:

from numpy import *

def comb(a, b):
    c = []
    for i in a:
        for j in b:
            c.append(r_[i,j])
    return c

Then, I used reduce() to apply that to m copies of the same array:

def combs(a, m):
    return reduce(comb, [a]*m)

Finally, I evaluate my function like this:

values = combs(np.arange(0, 1, 0.1), 6)
for val in values:
    print F(val)

This works, but it’s way too slow. I know the space of parameters is huge, but this shouldn’t be so slow. I have only sampled 106 (a million) points in this example and it took more than 15 seconds just to create the array values.

Is there a more efficient way of doing this with NumPy?

I can modify the way the function F takes its arguments if it’s necessary.

Answers:

itertools.combinations is in general the fastest way to get combinations from a Python container (if you do in fact want combinations, i.e., arrangements without repetitions and independent of order; that’s not what your code appears to be doing, but I can’t tell whether that’s because your code is buggy or because you’re using the wrong terminology).

If you want something different than combinations perhaps other iterators in itertools, product or permutations, might serve you better. For example, it looks like your code is roughly the same as:

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
    print F(val)

All of these iterators yield tuples, not lists or NumPy arrays, so if your F is picky about getting specifically a NumPy array, you’ll have to accept the extra overhead of constructing or clearing and refilling one at each step.

Answered By: Alex Martelli

Here’s a pure-NumPy implementation. It’s about 5 times faster than using itertools.

Python 3:

import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a Cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the Cartesian product of.
    out : ndarray
        Array to place the Cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing Cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    #m = n / arrays[0].size
    m = int(n / arrays[0].size)
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in range(1, arrays[0].size):
        #for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out

Python 2:


import numpy as np

def cartesian(arrays, out=None):
    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m, 1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
    return out
Answered By: pv.

It looks like you want a grid to evaluate your function, in which case you can use numpy.ogrid (open) or numpy.mgrid (fleshed out):

import numpy

my_grid = numpy.mgrid[[slice(0, 1, 0.1)]*6]
Answered By: steabert

You can do something like this

import numpy as np

def cartesian_coord(*arrays):
    grid = np.meshgrid(*arrays)
    coord_list = [entry.ravel() for entry in grid]
    points = np.vstack(coord_list).T
    return points

a = np.arange(4)  # Fake data
print(cartesian_coord(*6*[a])

which gives

array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 2],
   ...,
   [3, 3, 3, 3, 3, 1],
   [3, 3, 3, 3, 3, 2],
   [3, 3, 3, 3, 3, 3]])
Answered By: felippe

The following NumPy implementation should be approximately two times the speed of the given previous answers:

def cartesian2(arrays):
    arrays = [np.asarray(a) for a in arrays]
    shape = (len(x) for x in arrays)

    ix = np.indices(shape, dtype=int)
    ix = ix.reshape(len(arrays), -1).T

    for n, arr in enumerate(arrays):
        ix[:, n] = arrays[n][ix[:, n]]

    return ix
Answered By: Stefan van der Walt

Here’s yet another way, using pure NumPy, no recursion, no list comprehension, and no explicit for loops. It’s about 20% slower than the original answer, and it’s based on np.meshgrid.

def cartesian(*arrays):
    mesh = np.meshgrid(*arrays)  # Standard NumPy meshgrid
    dim = len(mesh)  # Number of dimensions
    elements = mesh[0].size  # Number of elements, any index will do
    flat = np.concatenate(mesh).ravel()  # Flatten the whole meshgrid
    reshape = np.reshape(flat, (dim, elements)).T  # Reshape and transpose
    return reshape

For example,

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

gives

[[0 0 0 0 0]
 [0 0 0 0 1]
 [0 0 0 0 2]
 ...,
 [2 2 2 2 0]
 [2 2 2 2 1]
 [2 2 2 2 2]]
Answered By: étale-cohomology

In newer versions of NumPy (>1.8.x), numpy.meshgrid() provides a much faster implementation:

For pv’s solution:

In [113]:

%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:

cartesian(([1, 2, 3], [4, 5], [6, 7]))

Out[114]:
array([[1, 4, 6],
       [1, 4, 7],
       [1, 5, 6],
       [1, 5, 7],
       [2, 4, 6],
       [2, 4, 7],
       [2, 5, 6],
       [2, 5, 7],
       [3, 4, 6],
       [3, 4, 7],
       [3, 5, 6],
       [3, 5, 7]])

numpy.meshgrid() used to be two-dimensional only, but now it is capable of multidimensional. In this case, three-dimensional:

In [115]:

%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:

np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)

Out[116]:
array([[1, 4, 6],
       [1, 5, 6],
       [2, 4, 6],
       [2, 5, 6],
       [3, 4, 6],
       [3, 5, 6],
       [1, 4, 7],
       [1, 5, 7],
       [2, 4, 7],
       [2, 5, 7],
       [3, 4, 7],
       [3, 5, 7]])

Note that the order of the final resultant is slightly different.

Answered By: CT Zhu

For a pure NumPy implementation of the Cartesian product of one-dimensional arrays (or flat Python lists), just use meshgrid(), roll the axes with transpose(), and reshape to the desired output:

 def cartprod(*arrays):
     N = len(arrays)
     return transpose(meshgrid(*arrays, indexing='ij'),
                      roll(arange(N + 1), -1)).reshape(-1, N)

Note this has the convention of the last axis changing the fastest ("C style" or "row-major").

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]:
array([[  1,   4, 100,  -5],
       [  1,   4, 100,  -4],
       [  1,   4, 200,  -5],
       [  1,   4, 200,  -4],
       [  1,   4, 300,  -5],
       [  1,   4, 300,  -4],
       [  1,   4, 400,  -5],
       [  1,   4, 400,  -4],
       [  1,   8, 100,  -5],
       [  1,   8, 100,  -4],
       [  1,   8, 200,  -5],
       [  1,   8, 200,  -4],
       [  1,   8, 300,  -5],
       [  1,   8, 300,  -4],
       [  1,   8, 400,  -5],
       [  1,   8, 400,  -4],
       [  2,   4, 100,  -5],
       [  2,   4, 100,  -4],
       [  2,   4, 200,  -5],
       [  2,   4, 200,  -4],
       [  2,   4, 300,  -5],
       [  2,   4, 300,  -4],
       [  2,   4, 400,  -5],
       [  2,   4, 400,  -4],
       [  2,   8, 100,  -5],
       [  2,   8, 100,  -4],
       [  2,   8, 200,  -5],
       [  2,   8, 200,  -4],
       [  2,   8, 300,  -5],
       [  2,   8, 300,  -4],
       [  2,   8, 400,  -5],
       [  2,   8, 400,  -4],
       [  3,   4, 100,  -5],
       [  3,   4, 100,  -4],
       [  3,   4, 200,  -5],
       [  3,   4, 200,  -4],
       [  3,   4, 300,  -5],
       [  3,   4, 300,  -4],
       [  3,   4, 400,  -5],
       [  3,   4, 400,  -4],
       [  3,   8, 100,  -5],
       [  3,   8, 100,  -4],
       [  3,   8, 200,  -5],
       [  3,   8, 200,  -4],
       [  3,   8, 300,  -5],
       [  3,   8, 300,  -4],
       [  3,   8, 400,  -5],
       [  3,   8, 400,  -4]])

If you want to change the first axis fastest ("Fortran style" or "column-major"), just change the order parameter of reshape() like this: reshape((-1, N), order='F')

Answered By: RBF06

You can use np.array(itertools.product(a, b)).

Answered By: William Song

Pandas merge offers a naive, fast solution to the problem:

# given the lists
x, y, z = [1, 2, 3], [4, 5], [6, 7]

# get dfs with same, constant index 
x = pd.DataFrame({'x': x}, index=np.repeat(0, len(x)))
y = pd.DataFrame({'y': y}, index=np.repeat(0, len(y)))
z = pd.DataFrame({'z': z}, index=np.repeat(0, len(z)))

# get all permutations stored in a new df
df = pd.merge(x, pd.merge(y, z, left_index=True, right_index=True),
              left_index=True, right_index=True)
Answered By: simone