Create numpy matrix filled with NaNs

Question:

I have the following code:

r = numpy.zeros(shape = (width, height, 9))

It creates a width x height x 9 matrix filled with zeros. Instead, I’d like to know if there’s a function or way to initialize them instead to NaNs in an easy way.

Asked By: devoured elysium

||

Answers:

Are you familiar with numpy.nan?

You can create your own method such as:

def nans(shape, dtype=float):
    a = numpy.empty(shape, dtype)
    a.fill(numpy.nan)
    return a

Then

nans([3,4])

would output

array([[ NaN,  NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN,  NaN]])

I found this code in a mailing list thread.

Answered By: Jorge Israel Peña

You rarely need loops for vector operations in numpy.
You can create an uninitialized array and assign to all entries at once:

>>> a = numpy.empty((3,3,))
>>> a[:] = numpy.nan
>>> a
array([[ NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN]])

I have timed the alternatives a[:] = numpy.nan here and a.fill(numpy.nan) as posted by Blaenk:

$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)"
10000 loops, best of 3: 54.3 usec per loop
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan" 
10000 loops, best of 3: 88.8 usec per loop

The timings show a preference for ndarray.fill(..) as the faster alternative. OTOH, I like numpy’s convenience implementation where you can assign values to whole slices at the time, the code’s intention is very clear.

Note that ndarray.fill performs its operation in-place, so numpy.empty((3,3,)).fill(numpy.nan) will instead return None.

Answered By: u0b34a0f6ae

Another option is to use numpy.full, an option available in NumPy 1.8+

a = np.full([height, width, 9], np.nan)

This is pretty flexible and you can fill it with any other number that you want.

Answered By: Pietro Biroli

You can always use multiplication if you don’t immediately recall the .empty or .full methods:

>>> np.nan * np.ones(shape=(3,2))
array([[ nan,  nan],
       [ nan,  nan],
       [ nan,  nan]])

Of course it works with any other numerical value as well:

>>> 42 * np.ones(shape=(3,2))
array([[ 42,  42],
       [ 42,  42],
       [ 42, 42]])

But the @u0b34a0f6ae’s accepted answer is 3x faster (CPU cycles, not brain cycles to remember numpy syntax ;):

$ python -mtimeit "import numpy as np; X = np.empty((100,100));" "X[:] = np.nan;"
100000 loops, best of 3: 8.9 usec per loop
(predict)laneh@predict:~/src/predict/predict/webapp$ master
$ python -mtimeit "import numpy as np; X = np.ones((100,100));" "X *= np.nan;"
10000 loops, best of 3: 24.9 usec per loop
Answered By: hobs

As said, numpy.empty() is the way to go. However, for objects, fill() might not do exactly what you think it does:

In[36]: a = numpy.empty(5,dtype=object)
In[37]: a.fill([])
In[38]: a
Out[38]: array([[], [], [], [], []], dtype=object)
In[39]: a[0].append(4)
In[40]: a
Out[40]: array([[4], [4], [4], [4], [4]], dtype=object)

One way around can be e.g.:

In[41]: a = numpy.empty(5,dtype=object)
In[42]: a[:]= [ [] for x in range(5)]
In[43]: a[0].append(4)
In[44]: a
Out[44]: array([[4], [], [], [], []], dtype=object)
Answered By: ntg

I compared the suggested alternatives for speed and found that, for large enough vectors/matrices to fill, all alternatives except val * ones and array(n * [val]) are equally fast.

enter image description here


Code to reproduce the plot:

import numpy
import perfplot

val = 42.0


def fill(n):
    a = numpy.empty(n)
    a.fill(val)
    return a


def colon(n):
    a = numpy.empty(n)
    a[:] = val
    return a


def full(n):
    return numpy.full(n, val)


def ones_times(n):
    return val * numpy.ones(n)


def list(n):
    return numpy.array(n * [val])


b = perfplot.bench(
    setup=lambda n: n,
    kernels=[fill, colon, full, ones_times, list],
    n_range=[2 ** k for k in range(20)],
    xlabel="len(a)",
)
b.save("out.png")
Answered By: Nico Schlömer

Yet another possibility not yet mentioned here is to use NumPy tile:

a = numpy.tile(numpy.nan, (3, 3))

Also gives

array([[ NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN],
       [ NaN,  NaN,  NaN]])

update:
I did a speed comparison, and it’s not very fast :/ It’s slower than the ones_times by a decimal.

Answered By: JHBonarius

Another alternative is numpy.broadcast_to(val,n) which returns in constant time regardless of the size and is also the most memory efficient (it returns a view of the repeated element). The caveat is that the returned value is read-only.

Below is a comparison of the performances of all the other methods that have been proposed using the same benchmark as in Nico Schlömer’s answer.

enter image description here

Answered By: Giancarlo Sportelli

Just a warning that initializing with np.empty() without subsequently editing the values can lead to (memory allocation?) problems:

arr1 = np.empty(96)
arr2 = np.empty(96)
print(arr1)
print(arr2)

# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan  1.  1.
#   1.  1.  2.  2.  2.  2. nan nan nan nan nan nan nan nan  0.  0.  0.  0.
#   0.  0.  0.  0. nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan]
#
# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan  1.  1.
#   1.  1.  2.  2.  2.  2. nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan]

The floats initialized in the array are used somewhere else in my script but are not associated with variables arr1 or arr2 at all. Spooky.

Answer from user @JHBonarius solved this problem:

arr = np.tile(np.nan, 96)
print(arr)

# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
#  nan nan nan nan nan nan]
Answered By: Novorodnas
>>> width = 2
>>> height = 3

>>> r = np.full((width, height, 9), np.nan)

>>> print(r)

array([[[nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan]],

       [[nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan]]])

>>> r.shape
(2, 3, 9)
Answered By: user19409802

Pardon my tardiness, but here is the fastest solution for large arrays, iff single-precision (f4 float32) is all you need. And yes, np.nan works as well.

def full_single_prec(n):
    return numpy.full(n, val, dtype='f4')

Perfplot Comparison Of All Approaches (Tested in 3.8.13)

Answered By: Larry Panozzo
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.