How do I stack vectors of different lengths in NumPy?
Question:
How do I stack column-wise n
vectors of shape (x,)
where x could be any number?
For example,
from numpy import *
a = ones((3,))
b = ones((2,))
c = vstack((a,b)) # <-- gives an error
c = vstack((a[:,newaxis],b[:,newaxis])) #<-- also gives an error
hstack
works fine but concatenates along the wrong dimension.
Answers:
Short answer: you can’t. NumPy does not support jagged arrays natively.
Long answer:
>>> a = ones((3,))
>>> b = ones((2,))
>>> c = array([a, b])
>>> c
array([[ 1. 1. 1.], [ 1. 1.]], dtype=object)
gives an array that may or may not behave as you expect. E.g. it doesn’t support basic methods like sum
or reshape
, and you should treat this much as you’d treat the ordinary Python list [a, b]
(iterate over it to perform operations instead of using vectorized idioms).
Several possible workarounds exist; the easiest is to coerce a
and b
to a common length, perhaps using masked arrays or NaN to signal that some indices are invalid in some rows. E.g. here’s b
as a masked array:
>>> ma.array(np.resize(b, a.shape[0]), mask=[False, False, True])
masked_array(data = [1.0 1.0 --],
mask = [False False True],
fill_value = 1e+20)
This can be stacked with a
as follows:
>>> ma.vstack([a, ma.array(np.resize(b, a.shape[0]), mask=[False, False, True])])
masked_array(data =
[[1.0 1.0 1.0]
[1.0 1.0 --]],
mask =
[[False False False]
[False False True]],
fill_value = 1e+20)
(For some purposes, scipy.sparse
may also be interesting.)
In general, there is an ambiguity in putting together arrays of different length because alignment of data might matter. Pandas
has different advanced solutions to deal with that, e.g. to merge series into dataFrames.
If you just want to populate columns starting from first element, what I usually do is build a matrix and populate columns. Of course you need to fill the empty spaces in the matrix with a null value (in this case np.nan
)
a = ones((3,))
b = ones((2,))
arraylist=[a,b]
outarr=np.ones((np.max([len(ps) for ps in arraylist]),len(arraylist)))*np.nan #define empty array
for i,c in enumerate(arraylist): #populate columns
outarr[:len(c),i]=c
In [108]: outarr
Out[108]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., nan]])
There is a new library for efficiently handling this type of arrays: https://github.com/scikit-hep/awkward-array
I know this is a really old post and that there may be a better way of doing this, BUT why not just use append for such an operation:
import numpy as np
a = np.ones((3,))
b = np.ones((2,))
c = np.append(a, b)
print(c)
output:
[1. 1. 1. 1. 1.]
I used the following code to combine lists of different length in a numpy array and to keep the length information in a second array:
import numpy as np
# create an example list (number can be increased):
my_list=[np.ones(i) for i in np.arange(1000)]
# measure and store length and find max:
dlc=np.array([len(i) for i in my_list]) #list contains the data length code
max_length=max(dlc)
# now we allocate an empty array
result=np.empty(max_length*len(my_list)).reshape(len(my_list),max_length)
# populate:
for i in np.arange(len(dlc)):
result[i][np.arange(dlc[i])]=my_list[i]
# check how the 10th element looks like
print(result[10],dlc[10])
I’m sure the code can be improved in case of the loops. But it already works quite quick because the memory is pre allocated by the empty array.
If you definitely want to use NumPy, you can match the shapes with np.nan and then "unpack" the nan-filled array later. Here is an example with functions.
import numpy as np
from numpy import *
a = np.array([[3,3,3]]).astype(float)
b = np.array([[2,2]]).astype(float)
# Extend each vector in array with Nan to reach same shape
def Pack_Matrices_with_NaN(List_of_matrices, Matrix_size):
Matrix_with_nan = np.arange(Matrix_size)
for array in List_of_matrices:
start_position = len(array[0])
for x in range(start_position,Matrix_size):
array = np.insert(array, (x), np.nan, axis=1)
Matrix_with_nan = np.vstack([Matrix_with_nan, array])
Matrix_with_nan = Matrix_with_nan[1:]
return Matrix_with_nan
arrays = [a,b]
packed_matrices = Pack_Matrices_with_NaN(arrays, 5)
print(packed_matrices)
Output:
[[ 3. 3. 3. nan nan]
[ 2. 2. nan nan nan]]
However, the easiest way would be to append the arrays to a list:
import numpy as np
a = np.array([3,3,3])
b = np.array([2,2])
c = []
c.append(a)
c.append(b)
print(c)
Output:
[array([3, 3, 3]), array([2, 2])]
How do I stack column-wise n
vectors of shape (x,)
where x could be any number?
For example,
from numpy import *
a = ones((3,))
b = ones((2,))
c = vstack((a,b)) # <-- gives an error
c = vstack((a[:,newaxis],b[:,newaxis])) #<-- also gives an error
hstack
works fine but concatenates along the wrong dimension.
Short answer: you can’t. NumPy does not support jagged arrays natively.
Long answer:
>>> a = ones((3,))
>>> b = ones((2,))
>>> c = array([a, b])
>>> c
array([[ 1. 1. 1.], [ 1. 1.]], dtype=object)
gives an array that may or may not behave as you expect. E.g. it doesn’t support basic methods like sum
or reshape
, and you should treat this much as you’d treat the ordinary Python list [a, b]
(iterate over it to perform operations instead of using vectorized idioms).
Several possible workarounds exist; the easiest is to coerce a
and b
to a common length, perhaps using masked arrays or NaN to signal that some indices are invalid in some rows. E.g. here’s b
as a masked array:
>>> ma.array(np.resize(b, a.shape[0]), mask=[False, False, True])
masked_array(data = [1.0 1.0 --],
mask = [False False True],
fill_value = 1e+20)
This can be stacked with a
as follows:
>>> ma.vstack([a, ma.array(np.resize(b, a.shape[0]), mask=[False, False, True])])
masked_array(data =
[[1.0 1.0 1.0]
[1.0 1.0 --]],
mask =
[[False False False]
[False False True]],
fill_value = 1e+20)
(For some purposes, scipy.sparse
may also be interesting.)
In general, there is an ambiguity in putting together arrays of different length because alignment of data might matter. Pandas
has different advanced solutions to deal with that, e.g. to merge series into dataFrames.
If you just want to populate columns starting from first element, what I usually do is build a matrix and populate columns. Of course you need to fill the empty spaces in the matrix with a null value (in this case np.nan
)
a = ones((3,))
b = ones((2,))
arraylist=[a,b]
outarr=np.ones((np.max([len(ps) for ps in arraylist]),len(arraylist)))*np.nan #define empty array
for i,c in enumerate(arraylist): #populate columns
outarr[:len(c),i]=c
In [108]: outarr
Out[108]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., nan]])
There is a new library for efficiently handling this type of arrays: https://github.com/scikit-hep/awkward-array
I know this is a really old post and that there may be a better way of doing this, BUT why not just use append for such an operation:
import numpy as np
a = np.ones((3,))
b = np.ones((2,))
c = np.append(a, b)
print(c)
output:
[1. 1. 1. 1. 1.]
I used the following code to combine lists of different length in a numpy array and to keep the length information in a second array:
import numpy as np
# create an example list (number can be increased):
my_list=[np.ones(i) for i in np.arange(1000)]
# measure and store length and find max:
dlc=np.array([len(i) for i in my_list]) #list contains the data length code
max_length=max(dlc)
# now we allocate an empty array
result=np.empty(max_length*len(my_list)).reshape(len(my_list),max_length)
# populate:
for i in np.arange(len(dlc)):
result[i][np.arange(dlc[i])]=my_list[i]
# check how the 10th element looks like
print(result[10],dlc[10])
I’m sure the code can be improved in case of the loops. But it already works quite quick because the memory is pre allocated by the empty array.
If you definitely want to use NumPy, you can match the shapes with np.nan and then "unpack" the nan-filled array later. Here is an example with functions.
import numpy as np
from numpy import *
a = np.array([[3,3,3]]).astype(float)
b = np.array([[2,2]]).astype(float)
# Extend each vector in array with Nan to reach same shape
def Pack_Matrices_with_NaN(List_of_matrices, Matrix_size):
Matrix_with_nan = np.arange(Matrix_size)
for array in List_of_matrices:
start_position = len(array[0])
for x in range(start_position,Matrix_size):
array = np.insert(array, (x), np.nan, axis=1)
Matrix_with_nan = np.vstack([Matrix_with_nan, array])
Matrix_with_nan = Matrix_with_nan[1:]
return Matrix_with_nan
arrays = [a,b]
packed_matrices = Pack_Matrices_with_NaN(arrays, 5)
print(packed_matrices)
Output:
[[ 3. 3. 3. nan nan]
[ 2. 2. nan nan nan]]
However, the easiest way would be to append the arrays to a list:
import numpy as np
a = np.array([3,3,3])
b = np.array([2,2])
c = []
c.append(a)
c.append(b)
print(c)
Output:
[array([3, 3, 3]), array([2, 2])]