efficient python array to numpy array conversion

Question:

I get a big array (image with 12 Mpix) in the array format from the python standard lib.
Since I want to perform operations on those array, I wish to convert it to a numpy array.
I tried the following:

import numpy
import array
from datetime import datetime
test = array.array('d', [0]*12000000)
t = datetime.now()
numpy.array(test)
print datetime.now() - t

I get a result between one or two seconds: equivalent to a loop in python.

Is there a more efficient way of doing this conversion?

Asked By: Simon Bergot

||

Answers:

np.array(test)                                       # 1.19s

np.fromiter(test, dtype=int)                         # 1.08s

np.frombuffer(test)                                  # 459ns !!!
Answered By: eumiro

asarray(x) is almost always the best choice for any array-like object.

array and fromiter are slow because they perform a copy. Using asarray allows this copy to be elided:

>>> import array
>>> import numpy as np
>>> test = array.array('d', [0]*12000000)
# very slow - this makes multiple copies that grow each time
>>> %timeit np.fromiter(test, dtype=test.typecode)
626 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# fast memory copy
>>> %timeit np.array(test)
63.5 ms ± 639 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# which is equivalent to doing the fast construction followed by a copy
>>> %timeit np.asarray(test).copy()
63.4 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# so doing just the construction is way faster
>>> %timeit np.asarray(test)
1.73 µs ± 70.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

# marginally faster, but at the expense of verbosity and type safety if you
# get the wrong type
>>> %timeit np.frombuffer(test, dtype=test.typecode)
1.07 µs ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Answered By: Eric
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.