How to convert list of numpy arrays into single numpy array?
Question:
Suppose I have ;
LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays
I try to convert;
array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5])
I am solving it by iteration on vstack right now but it is really slow for especially large LIST
What do you suggest for the best efficient way?
Answers:
In general you can concatenate a whole sequence of arrays along any axis:
numpy.concatenate( LIST, axis=0 )
but you do have to worry about the shape and dimensionality of each array in the list (for a 2-dimensional 3×5 output, you need to ensure that they are all 2-dimensional n-by-5 arrays already). If you want to concatenate 1-dimensional arrays as the rows of a 2-dimensional output, you need to expand their dimensionality.
As Jorge’s answer points out, there is also the function stack
, introduced in numpy 1.10:
numpy.stack( LIST, axis=0 )
This takes the complementary approach: it creates a new view of each input array and adds an extra dimension (in this case, on the left, so each n
-element 1D array becomes a 1-by-n
2D array) before concatenating. It will only work if all the input arrays have the same shape.
vstack
(or equivalently row_stack
) is often an easier-to-use solution because it will take a sequence of 1- and/or 2-dimensional arrays and expand the dimensionality automatically where necessary and only where necessary, before concatenating the whole list together. Where a new dimension is required, it is added on the left. Again, you can concatenate a whole list at once without needing to iterate:
numpy.vstack( LIST )
This flexible behavior is also exhibited by the syntactic shortcut numpy.r_[ array1, ...., arrayN ]
(note the square brackets). This is good for concatenating a few explicitly-named arrays but is no good for your situation because this syntax will not accept a sequence of arrays, like your LIST
.
There is also an analogous function column_stack
and shortcut c_[...]
, for horizontal (column-wise) stacking, as well as an almost-analogous function hstack
—although for some reason the latter is less flexible (it is stricter about input arrays’ dimensionality, and tries to concatenate 1-D arrays end-to-end instead of treating them as columns).
Finally, in the specific case of vertical stacking of 1-D arrays, the following also works:
numpy.array( LIST )
…because arrays can be constructed out of a sequence of other arrays, adding a new dimension to the beginning.
Starting in NumPy version 1.10, we have the method stack. It can stack arrays of any dimension (all equal):
# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]
# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True
M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)
# This are all true
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True
Enjoy,
I checked some of the methods for speed performance and find that there is no difference!
The only difference is that using some methods you must carefully check dimension.
Timing:
|------------|----------------|-------------------|
| | shape (10000) | shape (1,10000) |
|------------|----------------|-------------------|
| np.concat | 0.18280 | 0.17960 |
|------------|----------------|-------------------|
| np.stack | 0.21501 | 0.16465 |
|------------|----------------|-------------------|
| np.vstack | 0.21501 | 0.17181 |
|------------|----------------|-------------------|
| np.array | 0.21656 | 0.16833 |
|------------|----------------|-------------------|
As you can see I tried 2 experiments – using np.random.rand(10000)
and np.random.rand(1, 10000)
And if we use 2d arrays than np.stack
and np.array
create additional dimension – result.shape is (1,10000,10000) and (10000,1,10000) so they need additional actions to avoid this.
Code:
from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
new_np = np.random.rand(10000)
l.append(new_np)
start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')
start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')
start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')
start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')
Other solution is to use the asarray
function:
numpy.asarray( LIST )
I found a much more robust numpy function reshape
.
Problem of stack
and vstack
is, that it fails for an empty list.
>>> LIST = [np.array([1, 2, 3, 4, 5]), np.array([1, 2, 3, 4, 5]),np.array([1,2,3,4,5])]
>>> s = np.vstack(LIST)
>>> s.shape
(3, 5)
>>> s = np.vstack([])
ValueError: need at least one array to concatenate
The alternative is reshape
>>> s = np.reshape(LIST, (len(LIST),5))
>>> s.shape
(3, 5)
>>> LIST = []
>>> s = np.reshape(LIST, (len(LIST),5))
>>> s.shape
(0,5)
Drawback is, you need to know the length/shape of your inner array
Suppose I have ;
LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays
I try to convert;
array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5])
I am solving it by iteration on vstack right now but it is really slow for especially large LIST
What do you suggest for the best efficient way?
In general you can concatenate a whole sequence of arrays along any axis:
numpy.concatenate( LIST, axis=0 )
but you do have to worry about the shape and dimensionality of each array in the list (for a 2-dimensional 3×5 output, you need to ensure that they are all 2-dimensional n-by-5 arrays already). If you want to concatenate 1-dimensional arrays as the rows of a 2-dimensional output, you need to expand their dimensionality.
As Jorge’s answer points out, there is also the function stack
, introduced in numpy 1.10:
numpy.stack( LIST, axis=0 )
This takes the complementary approach: it creates a new view of each input array and adds an extra dimension (in this case, on the left, so each n
-element 1D array becomes a 1-by-n
2D array) before concatenating. It will only work if all the input arrays have the same shape.
vstack
(or equivalently row_stack
) is often an easier-to-use solution because it will take a sequence of 1- and/or 2-dimensional arrays and expand the dimensionality automatically where necessary and only where necessary, before concatenating the whole list together. Where a new dimension is required, it is added on the left. Again, you can concatenate a whole list at once without needing to iterate:
numpy.vstack( LIST )
This flexible behavior is also exhibited by the syntactic shortcut numpy.r_[ array1, ...., arrayN ]
(note the square brackets). This is good for concatenating a few explicitly-named arrays but is no good for your situation because this syntax will not accept a sequence of arrays, like your LIST
.
There is also an analogous function column_stack
and shortcut c_[...]
, for horizontal (column-wise) stacking, as well as an almost-analogous function hstack
—although for some reason the latter is less flexible (it is stricter about input arrays’ dimensionality, and tries to concatenate 1-D arrays end-to-end instead of treating them as columns).
Finally, in the specific case of vertical stacking of 1-D arrays, the following also works:
numpy.array( LIST )
…because arrays can be constructed out of a sequence of other arrays, adding a new dimension to the beginning.
Starting in NumPy version 1.10, we have the method stack. It can stack arrays of any dimension (all equal):
# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]
# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True
M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)
# This are all true
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True
Enjoy,
I checked some of the methods for speed performance and find that there is no difference!
The only difference is that using some methods you must carefully check dimension.
Timing:
|------------|----------------|-------------------|
| | shape (10000) | shape (1,10000) |
|------------|----------------|-------------------|
| np.concat | 0.18280 | 0.17960 |
|------------|----------------|-------------------|
| np.stack | 0.21501 | 0.16465 |
|------------|----------------|-------------------|
| np.vstack | 0.21501 | 0.17181 |
|------------|----------------|-------------------|
| np.array | 0.21656 | 0.16833 |
|------------|----------------|-------------------|
As you can see I tried 2 experiments – using np.random.rand(10000)
and np.random.rand(1, 10000)
And if we use 2d arrays than np.stack
and np.array
create additional dimension – result.shape is (1,10000,10000) and (10000,1,10000) so they need additional actions to avoid this.
Code:
from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
new_np = np.random.rand(10000)
l.append(new_np)
start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')
start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')
start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')
start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')
Other solution is to use the asarray
function:
numpy.asarray( LIST )
I found a much more robust numpy function reshape
.
Problem of stack
and vstack
is, that it fails for an empty list.
>>> LIST = [np.array([1, 2, 3, 4, 5]), np.array([1, 2, 3, 4, 5]),np.array([1,2,3,4,5])]
>>> s = np.vstack(LIST)
>>> s.shape
(3, 5)
>>> s = np.vstack([])
ValueError: need at least one array to concatenate
The alternative is reshape
>>> s = np.reshape(LIST, (len(LIST),5))
>>> s.shape
(3, 5)
>>> LIST = []
>>> s = np.reshape(LIST, (len(LIST),5))
>>> s.shape
(0,5)
Drawback is, you need to know the length/shape of your inner array