What is the preferred way to preallocate NumPy arrays?

Question:

I am new to NumPy/SciPy. From the documentation, it seems more efficient to preallocate
a single array rather than call append/insert/concatenate.

For example, to add a column of 1’s to an array, i think that this:

ar0 = np.linspace(10, 20, 16).reshape(4, 4)
ar0[:,-1] = np.ones_like(ar0[:,0])

is preferred to this:

ar0 = np.linspace(10, 20, 12).reshape(4, 3)
ar0 = np.insert(ar0, ar0.shape[1], np.ones_like(ar0[:,0]), axis=1)

my first question is whether this is correct (that the first is better), and my second question is, at the moment, I am just preallocating my arrays like this (which I noticed in several of the Cookbook examples on the SciPy Site):

np.zeros((8,5))

what is the ‘NumPy-preferred’ way to do this?

Asked By: kim busyn

||

Answers:

Preallocation mallocs all the memory you need in one call, while resizing the array (through calls to append,insert,concatenate or resize) may require copying the array to a larger block of memory. So you are correct, preallocation is preferred over (and should be faster than) resizing.

There are a number of “preferred” ways to preallocate numpy arrays depending on what you want to create. There is np.zeros, np.ones, np.empty, np.zeros_like, np.ones_like, and np.empty_like, and many others that create useful arrays such as np.linspace, and np.arange.

So

ar0 = np.linspace(10, 20, 16).reshape(4, 4)

is just fine if this comes closest to the ar0 you desire.

However, to make the last column all 1’s, I think the preferred way would be to just say

ar0[:,-1]=1

Since the shape of ar0[:,-1] is (4,), the 1 is broadcasted to match this shape.

Answered By: unutbu

In cases where performance is important, np.empty and np.zeros appear to be the fastest ways to initialize numpy arrays.

Below are test results for each method and a few others. Values are in seconds.

>>> timeit("np.empty(1000000)",number=1000, globals=globals())
0.033749611208094166
>>> timeit("np.zeros(1000000)",number=1000, globals=globals())
0.03421245135849915
>>> timeit("np.arange(0,1000000,1)",number=1000, globals=globals())
1.2212416112155324
>>> timeit("np.ones(1000000)",number=1000, globals=globals())
2.2877375495381145
>>> timeit("np.linspace(0,1000000,1000000)",number=1000, globals=globals())
3.0824269766860652
Answered By: Justas

in my experience, numpy.empty() is the fastest way to preallocate HUGE array. the array that I’m talking about has shape with (80,80,300000) and dtype uint8.

here is the code:

%timeit  np.empty((80,80,300000),dtype='uint8')
%timeit  np.zeros((80,80,300000),dtype='uint8')
%timeit  np.ones((80,80,300000),dtype='uint8')

and results from timing:

10000 loops, best of 3: 83.7 µs per loop  #Too much faster
1 loop, best of 3: 273 ms per loop
1 loop, best of 3: 272 ms per loop
Answered By: nima farhadi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.