Best way to initialize and fill an numpy array?
Question:
I want to initialize and fill a numpy
array. What is the best way?
This works as I expect:
>>> import numpy as np
>>> np.empty(3)
array([ -1.28822975e-231, -1.73060252e-077, 2.23946712e-314])
But this doesn’t:
>>> np.empty(3).fill(np.nan)
>>>
Nothing?
>>> type(np.empty(3))
<type 'numpy.ndarray'>
It seems to me that the np.empty()
call is returning the correct type of object, so I don’t understand why .fill()
is not working?
Assigning the result of np.empty()
first works fine:
>>> a = np.empty(3)
>>> a.fill(np.nan)
>>> a
array([ nan, nan, nan])
Why do I need to assign to a variable in order to use np.fill()
? Am I missing a better alternative?
Answers:
np.fill
modifies the array in-place, and returns None
. Therefor, if you’re assigning the result to a name, it gets a value of None
.
An alternative is to use an expression which returns nan
, e.g.:
a = np.empty(3) * np.nan
You could also try:
In [79]: np.full(3, np.nan)
Out[79]: array([ nan, nan, nan])
The pertinent doc:
Definition: np.full(shape, fill_value, dtype=None, order='C')
Docstring:
Return a new array of given shape and type, filled with `fill_value`.
Although I think this might be only available in numpy 1.8+
I find this easy to remember:
numpy.array([numpy.nan]*3)
Out of curiosity, I timed it, and both @JoshAdel’s answer and @shx2’s answer are far faster than mine with large arrays.
In [34]: %timeit -n10000 numpy.array([numpy.nan]*10000)
10000 loops, best of 3: 273 µs per loop
In [35]: %timeit -n10000 numpy.empty(10000)* numpy.nan
10000 loops, best of 3: 6.5 µs per loop
In [36]: %timeit -n10000 numpy.full(10000, numpy.nan)
10000 loops, best of 3: 5.42 µs per loop
Just for future reference, the multiplication by np.nan
only works because of the mathematical properties of np.nan
.
For a generic value N
, one would need to use np.ones() * N
mimicking the accepted answer, however, speed-wise, this is not a terribly good choice.
Best choice would be np.full()
as already pointed out, and, if that is not available for you, np.zeros() + N
seems to be a better choice than np.ones() * N
, while np.empty() + N
or np.empty() * N
are simply not suitable. Note that np.zeros() + N
will also work when N
is np.nan
.
%timeit x = np.full((1000, 1000, 10), 432.4)
8.19 ms ± 97.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = np.zeros((1000, 1000, 10)) + 432.4
9.86 ms ± 55.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = np.ones((1000, 1000, 10)) * 432.4
17.3 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = np.array([432.4] * (1000 * 1000 * 10)).reshape((1000, 1000, 10))
316 ms ± 37.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you don’t mind None
, you can use:
a = np.empty(3, dtype=object)
I want to initialize and fill a numpy
array. What is the best way?
This works as I expect:
>>> import numpy as np
>>> np.empty(3)
array([ -1.28822975e-231, -1.73060252e-077, 2.23946712e-314])
But this doesn’t:
>>> np.empty(3).fill(np.nan)
>>>
Nothing?
>>> type(np.empty(3))
<type 'numpy.ndarray'>
It seems to me that the np.empty()
call is returning the correct type of object, so I don’t understand why .fill()
is not working?
Assigning the result of np.empty()
first works fine:
>>> a = np.empty(3)
>>> a.fill(np.nan)
>>> a
array([ nan, nan, nan])
Why do I need to assign to a variable in order to use np.fill()
? Am I missing a better alternative?
np.fill
modifies the array in-place, and returns None
. Therefor, if you’re assigning the result to a name, it gets a value of None
.
An alternative is to use an expression which returns nan
, e.g.:
a = np.empty(3) * np.nan
You could also try:
In [79]: np.full(3, np.nan)
Out[79]: array([ nan, nan, nan])
The pertinent doc:
Definition: np.full(shape, fill_value, dtype=None, order='C')
Docstring:
Return a new array of given shape and type, filled with `fill_value`.
Although I think this might be only available in numpy 1.8+
I find this easy to remember:
numpy.array([numpy.nan]*3)
Out of curiosity, I timed it, and both @JoshAdel’s answer and @shx2’s answer are far faster than mine with large arrays.
In [34]: %timeit -n10000 numpy.array([numpy.nan]*10000)
10000 loops, best of 3: 273 µs per loop
In [35]: %timeit -n10000 numpy.empty(10000)* numpy.nan
10000 loops, best of 3: 6.5 µs per loop
In [36]: %timeit -n10000 numpy.full(10000, numpy.nan)
10000 loops, best of 3: 5.42 µs per loop
Just for future reference, the multiplication by np.nan
only works because of the mathematical properties of np.nan
.
For a generic value N
, one would need to use np.ones() * N
mimicking the accepted answer, however, speed-wise, this is not a terribly good choice.
Best choice would be np.full()
as already pointed out, and, if that is not available for you, np.zeros() + N
seems to be a better choice than np.ones() * N
, while np.empty() + N
or np.empty() * N
are simply not suitable. Note that np.zeros() + N
will also work when N
is np.nan
.
%timeit x = np.full((1000, 1000, 10), 432.4)
8.19 ms ± 97.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = np.zeros((1000, 1000, 10)) + 432.4
9.86 ms ± 55.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = np.ones((1000, 1000, 10)) * 432.4
17.3 ms ± 104 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = np.array([432.4] * (1000 * 1000 * 10)).reshape((1000, 1000, 10))
316 ms ± 37.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you don’t mind None
, you can use:
a = np.empty(3, dtype=object)