Efficient creation of numpy arrays from list comprehension and in general

Question:

In my current work, I use Numpy and list comprehensions a lot and in the interest of the best possible performance I have the following questions:

What actually happens behind the scenes if I create a Numpy array as follows?

a = numpy.array( [1,2,3,4] )

My guess is that python first creates an ordinary list containing the values, then uses the list size to allocate a numpy array and afterwards copies the values into this new array. Is this correct, or is the interpreter clever enough to realize that the list is only intermediary and instead copy the values directly?

Similarly, if i wish to create a numpy array from list comprehension using numpy.fromiter():

a = numpy.fromiter( [ x for x in xrange(0,4) ], int )

will this result in an intermediary list of values being created before being fed into fromiter()?

Asked By: NielsGM

||

Answers:

I believe than answer you are looking for is using generator expressions with numpy.fromiter.

numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>)

Generator expressions are lazy – they evaluate the expression when you iterate through them.

Using list comprehensions makes the list, then feeds it into numpy, while generator expressions will yield one at a time.

Python evaluates things inside -> out, like most languages (if not all), so using [<something> for <something_else> in <something_different>] would make the list, then iterate over it.

Answered By: Snakes and Coffee

You could create your own list and experiment with it to shed some light on the situation…

>>> class my_list(list):
...     def __init__(self, arg):
...         print 'spam'
...         super(my_list, self).__init__(arg)
...   def __len__(self):
...       print 'eggs'
...       return super(my_list, self).__len__()
... 
>>> x = my_list([0,1,2,3])
spam
>>> len(x)
eggs
4
>>> import numpy as np
>>> np.array(x)
eggs
eggs
eggs
eggs
array([0, 1, 2, 3])
>>> np.fromiter(x, int)
array([0, 1, 2, 3])
>>> np.array(my_list([0,1,2,3]))
spam
eggs
eggs
eggs
eggs
array([0, 1, 2, 3])
Answered By: wim

To the question in the title, there is now a package called numba which supports numpy array comprehension, which directly constructs the numpy array without intermediate python lists. Unlike numpy.fromiter, it also supports nested comprehension. However, bear in mind that there are some restrictions and performance quirks with numba if you are not familiar with it.

That said, it can be quite fast and efficient, but if you can write it using numpy’s vector operations it may be better to keep it simpler.

>>> from timeit import timeit
>>> # using list comprehension
>>> timeit("np.array([i*i for i in range(1000)])", "import numpy as np", number=1000)
2.544344299999999
>>> # using numpy operations
>>> timeit("np.arange(1000) ** 2", "import numpy as np", number=1000)
0.05207519999999022
>>> # using numpy.fromiter
>>> timeit("np.fromiter((i*i for i in range(1000)), dtype=int, count=1000)",
...        "import numpy as np",
...        number=1000)
1.087984500000175
>>> # using numba array comprehension
>>> timeit("squares(1000)",
... """
... import numpy as np
... import numba as nb
... 
... @nb.njit
... def squares(n):
...     return np.array([i*i for i in range(n)])
... 
... 'compile the function'
... squares(10)
... """,
... number=1000)
0.03716940000003888
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.