NumPy array initialization (fill with identical values)
Question:
I need to create a NumPy array of length n
, each element of which is v
.
Is there anything better than:
a = empty(n)
for i in range(n):
a[i] = v
I know zeros
and ones
would work for v = 0, 1. I could use v * ones(n)
, but it won’t work when v
is None
, and also would be much slower.
Answers:
I believe fill
is the fastest way to do this.
a = np.empty(10)
a.fill(7)
You should also always avoid iterating like you are doing in your example. A simple a[:] = v
will accomplish what your iteration does using numpy broadcasting.
You can use numpy.tile
, e.g. :
v = 7
rows = 3
cols = 5
a = numpy.tile(v, (rows,cols))
a
Out[1]:
array([[7, 7, 7, 7, 7],
[7, 7, 7, 7, 7],
[7, 7, 7, 7, 7]])
Although tile
is meant to ’tile’ an array (instead of a scalar, as in this case), it will do the job, creating pre-filled arrays of any size and dimension.
Updated for Numpy 1.7.0:(Hat-tip to @Rolf Bartstra.)
a=np.empty(n); a.fill(5)
is fastest.
In descending speed order:
%timeit a=np.empty(10000); a.fill(5)
100000 loops, best of 3: 5.85 us per loop
%timeit a=np.empty(10000); a[:]=5
100000 loops, best of 3: 7.15 us per loop
%timeit a=np.ones(10000)*5
10000 loops, best of 3: 22.9 us per loop
%timeit a=np.repeat(5,(10000))
10000 loops, best of 3: 81.7 us per loop
%timeit a=np.tile(5,[10000])
10000 loops, best of 3: 82.9 us per loop
Apparently, not only the absolute speeds but also the speed order (as reported by user1579844) are machine dependent; here’s what I found:
a=np.empty(1e4); a.fill(5)
is fastest;
In descending speed order:
timeit a=np.empty(1e4); a.fill(5)
# 100000 loops, best of 3: 10.2 us per loop
timeit a=np.empty(1e4); a[:]=5
# 100000 loops, best of 3: 16.9 us per loop
timeit a=np.ones(1e4)*5
# 100000 loops, best of 3: 32.2 us per loop
timeit a=np.tile(5,[1e4])
# 10000 loops, best of 3: 90.9 us per loop
timeit a=np.repeat(5,(1e4))
# 10000 loops, best of 3: 98.3 us per loop
timeit a=np.array([5]*int(1e4))
# 1000 loops, best of 3: 1.69 ms per loop (slowest BY FAR!)
So, try and find out, and use what’s fastest on your platform.
NumPy 1.8 introduced np.full()
, which is a more direct method than empty()
followed by fill()
for creating an array filled with a certain value:
>>> np.full((3, 5), 7)
array([[ 7., 7., 7., 7., 7.],
[ 7., 7., 7., 7., 7.],
[ 7., 7., 7., 7., 7.]])
>>> np.full((3, 5), 7, dtype=int)
array([[7, 7, 7, 7, 7],
[7, 7, 7, 7, 7],
[7, 7, 7, 7, 7]])
This is arguably the way of creating an array filled with certain values, because it explicitly describes what is being achieved (and it can in principle be very efficient since it performs a very specific task).
I had np.array(n * [value])
in mind, but apparently that is slower than all other suggestions for large enough n
. The best in terms of readability and speed is
np.full(n, 3.14)
Here is full comparison with perfplot (a pet project of mine).
The two empty
alternatives are still the fastest (with NumPy 1.12.1). full
catches up for large arrays.
Code to generate the plot:
import numpy as np
import perfplot
def empty_fill(n):
a = np.empty(n)
a.fill(3.14)
return a
def empty_colon(n):
a = np.empty(n)
a[:] = 3.14
return a
def ones_times(n):
return 3.14 * np.ones(n)
def repeat(n):
return np.repeat(3.14, (n))
def tile(n):
return np.repeat(3.14, [n])
def full(n):
return np.full((n), 3.14)
def list_to_array(n):
return np.array(n * [3.14])
perfplot.show(
setup=lambda n: n,
kernels=[empty_fill, empty_colon, ones_times, repeat, tile, full, list_to_array],
n_range=[2 ** k for k in range(27)],
xlabel="len(a)",
logx=True,
logy=True,
)
without numpy
>>>[2]*3
[2, 2, 2]
You can also use np.broadcast_to
.
To create an array of shape (dimensions) s
and of value v
, you can do (in your case, the array is 1-D, and s = (n,)
):
a = np.broadcast_to(v, s).copy()
if a
only needs to be read-only, you can use the following (which is way more efficient):
a = np.broadcast_to(v, s)
The advantage is that v
can be given as a single number, but also as an array if different values are desired (as long as v.shape
matches the tail of s
).
Bonus: if you want to force the dtype
of the created array:
a = np.broadcast_to(np.asarray(v, dtype), s).copy()
We could also write
v=7
n=5
a=np.linspace(v,v,n)
I need to create a NumPy array of length n
, each element of which is v
.
Is there anything better than:
a = empty(n)
for i in range(n):
a[i] = v
I know zeros
and ones
would work for v = 0, 1. I could use v * ones(n)
, but it won’t work when would be much slower.v
is None
, and also
I believe fill
is the fastest way to do this.
a = np.empty(10)
a.fill(7)
You should also always avoid iterating like you are doing in your example. A simple a[:] = v
will accomplish what your iteration does using numpy broadcasting.
You can use numpy.tile
, e.g. :
v = 7
rows = 3
cols = 5
a = numpy.tile(v, (rows,cols))
a
Out[1]:
array([[7, 7, 7, 7, 7],
[7, 7, 7, 7, 7],
[7, 7, 7, 7, 7]])
Although tile
is meant to ’tile’ an array (instead of a scalar, as in this case), it will do the job, creating pre-filled arrays of any size and dimension.
Updated for Numpy 1.7.0:(Hat-tip to @Rolf Bartstra.)
a=np.empty(n); a.fill(5)
is fastest.
In descending speed order:
%timeit a=np.empty(10000); a.fill(5)
100000 loops, best of 3: 5.85 us per loop
%timeit a=np.empty(10000); a[:]=5
100000 loops, best of 3: 7.15 us per loop
%timeit a=np.ones(10000)*5
10000 loops, best of 3: 22.9 us per loop
%timeit a=np.repeat(5,(10000))
10000 loops, best of 3: 81.7 us per loop
%timeit a=np.tile(5,[10000])
10000 loops, best of 3: 82.9 us per loop
Apparently, not only the absolute speeds but also the speed order (as reported by user1579844) are machine dependent; here’s what I found:
a=np.empty(1e4); a.fill(5)
is fastest;
In descending speed order:
timeit a=np.empty(1e4); a.fill(5)
# 100000 loops, best of 3: 10.2 us per loop
timeit a=np.empty(1e4); a[:]=5
# 100000 loops, best of 3: 16.9 us per loop
timeit a=np.ones(1e4)*5
# 100000 loops, best of 3: 32.2 us per loop
timeit a=np.tile(5,[1e4])
# 10000 loops, best of 3: 90.9 us per loop
timeit a=np.repeat(5,(1e4))
# 10000 loops, best of 3: 98.3 us per loop
timeit a=np.array([5]*int(1e4))
# 1000 loops, best of 3: 1.69 ms per loop (slowest BY FAR!)
So, try and find out, and use what’s fastest on your platform.
NumPy 1.8 introduced np.full()
, which is a more direct method than empty()
followed by fill()
for creating an array filled with a certain value:
>>> np.full((3, 5), 7)
array([[ 7., 7., 7., 7., 7.],
[ 7., 7., 7., 7., 7.],
[ 7., 7., 7., 7., 7.]])
>>> np.full((3, 5), 7, dtype=int)
array([[7, 7, 7, 7, 7],
[7, 7, 7, 7, 7],
[7, 7, 7, 7, 7]])
This is arguably the way of creating an array filled with certain values, because it explicitly describes what is being achieved (and it can in principle be very efficient since it performs a very specific task).
I had np.array(n * [value])
in mind, but apparently that is slower than all other suggestions for large enough n
. The best in terms of readability and speed is
np.full(n, 3.14)
Here is full comparison with perfplot (a pet project of mine).
The two empty
alternatives are still the fastest (with NumPy 1.12.1). full
catches up for large arrays.
Code to generate the plot:
import numpy as np
import perfplot
def empty_fill(n):
a = np.empty(n)
a.fill(3.14)
return a
def empty_colon(n):
a = np.empty(n)
a[:] = 3.14
return a
def ones_times(n):
return 3.14 * np.ones(n)
def repeat(n):
return np.repeat(3.14, (n))
def tile(n):
return np.repeat(3.14, [n])
def full(n):
return np.full((n), 3.14)
def list_to_array(n):
return np.array(n * [3.14])
perfplot.show(
setup=lambda n: n,
kernels=[empty_fill, empty_colon, ones_times, repeat, tile, full, list_to_array],
n_range=[2 ** k for k in range(27)],
xlabel="len(a)",
logx=True,
logy=True,
)
without numpy
>>>[2]*3
[2, 2, 2]
You can also use np.broadcast_to
.
To create an array of shape (dimensions) s
and of value v
, you can do (in your case, the array is 1-D, and s = (n,)
):
a = np.broadcast_to(v, s).copy()
if a
only needs to be read-only, you can use the following (which is way more efficient):
a = np.broadcast_to(v, s)
The advantage is that v
can be given as a single number, but also as an array if different values are desired (as long as v.shape
matches the tail of s
).
Bonus: if you want to force the dtype
of the created array:
a = np.broadcast_to(np.asarray(v, dtype), s).copy()
We could also write
v=7
n=5
a=np.linspace(v,v,n)