Good ways to "expand" a numpy ndarray?

Question:

Are there good ways to “expand” a numpy ndarray? Say I have an ndarray like this:

[[1 2]
 [3 4]]

And I want each row to contains more elements by filling zeros:

[[1 2 0 0 0]
 [3 4 0 0 0]]

I know there must be some brute-force ways to do so (say construct a bigger array with zeros then copy elements from old smaller arrays), just wondering are there pythonic ways to do so. Tried numpy.reshape but didn’t work:

import numpy as np
a = np.array([[1, 2], [3, 4]])
np.reshape(a, (2, 5))

Numpy complains that: ValueError: total size of new array must be unchanged

Asked By: clwen

||

Answers:

You should use np.column_stack or append

import numpy as np

p = np.array([ [1,2] , [3,4] ])

p = np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )

p
Out[277]: 
array([[1, 2, 0, 0],
       [3, 4, 0, 0]])

Append seems to be faster though:

timeit np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
10000 loops, best of 3: 61.8 us per loop

timeit np.append(p, [[0,0],[0,0]],1)
10000 loops, best of 3: 48 us per loop

And a comparison with np.c_ and np.hstack [append still seems to be the fastest]:

In [295]: z=np.zeros((2, 2), dtype=a.dtype)

In [296]: timeit np.c_[a, z]
10000 loops, best of 3: 47.2 us per loop

In [297]: timeit np.append(p, z,1)
100000 loops, best of 3: 13.1 us per loop

In [305]: timeit np.hstack((p,z))
10000 loops, best of 3: 20.8 us per loop

and np.concatenate [that is a even a bit faster than append]:

In [307]: timeit np.concatenate((p, z), axis=1)
100000 loops, best of 3: 11.6 us per loop
Answered By: root

There are the index tricks r_ and c_.

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> z = np.zeros((2, 3), dtype=a.dtype)
>>> np.c_[a, z]
array([[1, 2, 0, 0, 0],
       [3, 4, 0, 0, 0]])

If this is performance critical code, you might prefer to use the equivalent np.concatenate rather than the index tricks.

>>> np.concatenate((a,z), axis=1)
array([[1, 2, 0, 0, 0],
       [3, 4, 0, 0, 0]])

There are also np.resize and np.ndarray.resize, but they have some limitations (due to the way numpy lays out data in memory) so read the docstring on those ones. You will probably find that simply concatenating is better.

By the way, when I’ve needed to do this I usually just do it the basic way you’ve already mentioned (create an array of zeros and assign the smaller array inside it), I don’t see anything wrong with that!

Answered By: wim

Just to be clear: there’s no “good” way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed. The only thing you can do is to create a new array and replace some of its elements by the elements of the original array.

A lot of functions are available for convenience (the np.concatenate function and its np.*stack shortcuts, the np.column_stack, the indexes routines np.r_ and np.c_…), but there are just that: convenience functions. Some of them are optimized at the C level (the np.concatenate and others, I think), some are not.

Note that there’s nothing at all with your initial suggestion of creating a large array ‘by hand’ (possibly filled with zeros) and filling it yourself with your initial array. It might be more readable that more complicated solutions.

Answered By: Pierre GM

You can use numpy.pad, as follows:

>>> import numpy as np
>>> a=[[1,2],[3,4]]
>>> np.pad(a, ((0,0),(0,3)), mode='constant', constant_values=0)
array([[1, 2, 0, 0, 0],
       [3, 4, 0, 0, 0]])

Here np.pad says, “Take the array a and add 0 rows above it, 0 rows below it, 0 columns to the left of it, and 3 columns to the right of it. Fill these columns with a constant specified by constant_values“.

Answered By: Richard

there are also similar methods like np.vstack, np.hstack, np.dstack. I like these over np.concatente as it makes it clear what dimension is being “expanded”.

temp = np.array([[1, 2], [3, 4]])
np.hstack((temp, np.zeros((2,3))))

it’s easy to remember becase numpy’s first axis is vertical so vstack expands the first axis and 2nd axis is horizontal so hstack.

Answered By: otterb

A simple way:

# what you want to expand
x = np.ones((3, 3))

# expand to what shape 
target = np.zeros((6, 6))

# do expand
target[:x.shape[0], :x.shape[1]] = x

# print target
array([[ 1.,  1.,  1.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

Functional way:

borrow from https://stackoverflow.com/a/35751427/1637673, with a little modification.

def pad(array, reference_shape, offsets=None):
    """
    array: Array to be padded
    reference_shape: tuple of size of narray to create
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
    """

    if not offsets:
        offsets = np.zeros(array.ndim, dtype=np.int32)

    # Create an array of zeros with the reference shape
    result = np.zeros(reference_shape, dtype=np.float32)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = array
    return result
Answered By: Mithril