From ND to 1D arrays
Question:
Say I have an array a
:
a = np.array([[1,2,3], [4,5,6]])
array([[1, 2, 3],
[4, 5, 6]])
I would like to convert it to a 1D array (i.e. a column vector):
b = np.reshape(a, (1,np.product(a.shape)))
but this returns
array([[1, 2, 3, 4, 5, 6]])
which is not the same as:
array([1, 2, 3, 4, 5, 6])
I can take the first element of this array to manually convert it to a 1D array:
b = np.reshape(a, (1,np.product(a.shape)))[0]
but this requires me to know how many dimensions the original array has (and concatenate [0]’s when working with higher dimensions)
Is there a dimensions-independent way of getting a column/row vector from an arbitrary ndarray?
Answers:
Use np.ravel (for a 1D view) or np.ndarray.flatten (for a 1D copy) or np.ndarray.flat (for an 1D iterator):
In [12]: a = np.array([[1,2,3], [4,5,6]])
In [13]: b = a.ravel()
In [14]: b
Out[14]: array([1, 2, 3, 4, 5, 6])
Note that ravel()
returns a view
of a
when possible. So modifying b
also modifies a
. ravel()
returns a view
when the 1D elements are contiguous in memory, but would return a copy
if, for example, a
were made from slicing another array using a non-unit step size (e.g. a = x[::2]
).
If you want a copy rather than a view, use
In [15]: c = a.flatten()
If you just want an iterator, use np.ndarray.flat
:
In [20]: d = a.flat
In [21]: d
Out[21]: <numpy.flatiter object at 0x8ec2068>
In [22]: list(d)
Out[22]: [1, 2, 3, 4, 5, 6]
In [14]: b = np.reshape(a, (np.product(a.shape),))
In [15]: b
Out[15]: array([1, 2, 3, 4, 5, 6])
or, simply:
In [16]: a.flatten()
Out[16]: array([1, 2, 3, 4, 5, 6])
Although this isn’t using the np array format, (to lazy to modify my code) this should do what you want… If, you truly want a column vector you will want to transpose the vector result. It all depends on how you are planning to use this.
def getVector(data_array,col):
vector = []
imax = len(data_array)
for i in range(imax):
vector.append(data_array[i][col])
return ( vector )
a = ([1,2,3], [4,5,6])
b = getVector(a,1)
print(b)
Out>[2,5]
So if you need to transpose, you can do something like this:
def transposeArray(data_array):
# need to test if this is a 1D array
# can't do a len(data_array[0]) if it's 1D
two_d = True
if isinstance(data_array[0], list):
dimx = len(data_array[0])
else:
dimx = 1
two_d = False
dimy = len(data_array)
# init output transposed array
data_array_t = [[0 for row in range(dimx)] for col in range(dimy)]
# fill output transposed array
for i in range(dimx):
for j in range(dimy):
if two_d:
data_array_t[j][i] = data_array[i][j]
else:
data_array_t[j][i] = data_array[j]
return data_array_t
For list of array with different size use following:
import numpy as np
# ND array list with different size
a = [[1],[2,3,4,5],[6,7,8]]
# stack them
b = np.hstack(a)
print(b)
Output:
[1 2 3 4 5 6 7 8]
One of the simplest way is to use flatten()
, like this example :
import numpy as np
batch_y =train_output.iloc[sample, :]
batch_y = np.array(batch_y).flatten()
My array it was like this :
0
0 6
1 6
2 5
3 4
4 3
.
.
.
After using flatten()
:
array([6, 6, 5, ..., 5, 3, 6])
It’s also the solution of errors of this type :
Cannot feed value of shape (100, 1) for Tensor 'input/Y:0', which has shape '(?,)'
I wanted to see a benchmark result of functions mentioned in answers including unutbu’s.
Also want to point out that numpy doc recommend to use arr.reshape(-1)
in case view is preferable. (even though ravel
is tad faster in the following result)
TL;DR: np.ravel
is the most performant (by very small amount).
Benchmark
Functions:
np.ravel
: returns view, if possible
np.reshape(-1)
: returns view, if possible
np.flatten
: returns copy
np.flat
: returns numpy.flatiter
. similar to iterable
numpy version: ‘1.18.0’
Execution times on different ndarray
sizes
+-------------+----------+-----------+-----------+-------------+
| function | 10x10 | 100x100 | 1000x1000 | 10000x10000 |
+-------------+----------+-----------+-----------+-------------+
| ravel | 0.002073 | 0.002123 | 0.002153 | 0.002077 |
| reshape(-1) | 0.002612 | 0.002635 | 0.002674 | 0.002701 |
| flatten | 0.000810 | 0.007467 | 0.587538 | 107.321913 |
| flat | 0.000337 | 0.000255 | 0.000227 | 0.000216 |
+-------------+----------+-----------+-----------+-------------+
Conclusion
ravel
and reshape(-1)
‘s execution time was consistent and independent from ndarray size.
However, ravel
is tad faster, but reshape
provides flexibility in reshaping size. (maybe that’s why numpy doc recommend to use it instead. Or there could be some cases where reshape
returns view and ravel
doesn’t).
If you are dealing with large size ndarray, using flatten
can cause a performance issue. Recommend not to use it. Unless you need a copy of the data to do something else.
Used code
import timeit
setup = '''
import numpy as np
nd = np.random.randint(10, size=(10, 10))
'''
timeit.timeit('nd = np.reshape(nd, -1)', setup=setup, number=1000)
timeit.timeit('nd = np.ravel(nd)', setup=setup, number=1000)
timeit.timeit('nd = nd.flatten()', setup=setup, number=1000)
timeit.timeit('nd.flat', setup=setup, number=1000)
Best and fastest among all the suggested solutions: np.reshape()
%timeit img1ary = np.reshape(img2ary,(np.product(img2ary.shape),1))
9.3 µs ± 69.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit img1ary = img2ary.ravel()
157 ns ± 1.32 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit img1ary = img2ary.flatten()
961 ns ± 5.77 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Say I have an array a
:
a = np.array([[1,2,3], [4,5,6]])
array([[1, 2, 3],
[4, 5, 6]])
I would like to convert it to a 1D array (i.e. a column vector):
b = np.reshape(a, (1,np.product(a.shape)))
but this returns
array([[1, 2, 3, 4, 5, 6]])
which is not the same as:
array([1, 2, 3, 4, 5, 6])
I can take the first element of this array to manually convert it to a 1D array:
b = np.reshape(a, (1,np.product(a.shape)))[0]
but this requires me to know how many dimensions the original array has (and concatenate [0]’s when working with higher dimensions)
Is there a dimensions-independent way of getting a column/row vector from an arbitrary ndarray?
Use np.ravel (for a 1D view) or np.ndarray.flatten (for a 1D copy) or np.ndarray.flat (for an 1D iterator):
In [12]: a = np.array([[1,2,3], [4,5,6]])
In [13]: b = a.ravel()
In [14]: b
Out[14]: array([1, 2, 3, 4, 5, 6])
Note that ravel()
returns a view
of a
when possible. So modifying b
also modifies a
. ravel()
returns a view
when the 1D elements are contiguous in memory, but would return a copy
if, for example, a
were made from slicing another array using a non-unit step size (e.g. a = x[::2]
).
If you want a copy rather than a view, use
In [15]: c = a.flatten()
If you just want an iterator, use np.ndarray.flat
:
In [20]: d = a.flat
In [21]: d
Out[21]: <numpy.flatiter object at 0x8ec2068>
In [22]: list(d)
Out[22]: [1, 2, 3, 4, 5, 6]
In [14]: b = np.reshape(a, (np.product(a.shape),))
In [15]: b
Out[15]: array([1, 2, 3, 4, 5, 6])
or, simply:
In [16]: a.flatten()
Out[16]: array([1, 2, 3, 4, 5, 6])
Although this isn’t using the np array format, (to lazy to modify my code) this should do what you want… If, you truly want a column vector you will want to transpose the vector result. It all depends on how you are planning to use this.
def getVector(data_array,col):
vector = []
imax = len(data_array)
for i in range(imax):
vector.append(data_array[i][col])
return ( vector )
a = ([1,2,3], [4,5,6])
b = getVector(a,1)
print(b)
Out>[2,5]
So if you need to transpose, you can do something like this:
def transposeArray(data_array):
# need to test if this is a 1D array
# can't do a len(data_array[0]) if it's 1D
two_d = True
if isinstance(data_array[0], list):
dimx = len(data_array[0])
else:
dimx = 1
two_d = False
dimy = len(data_array)
# init output transposed array
data_array_t = [[0 for row in range(dimx)] for col in range(dimy)]
# fill output transposed array
for i in range(dimx):
for j in range(dimy):
if two_d:
data_array_t[j][i] = data_array[i][j]
else:
data_array_t[j][i] = data_array[j]
return data_array_t
For list of array with different size use following:
import numpy as np
# ND array list with different size
a = [[1],[2,3,4,5],[6,7,8]]
# stack them
b = np.hstack(a)
print(b)
Output:
[1 2 3 4 5 6 7 8]
One of the simplest way is to use flatten()
, like this example :
import numpy as np
batch_y =train_output.iloc[sample, :]
batch_y = np.array(batch_y).flatten()
My array it was like this :
0
0 6
1 6
2 5
3 4
4 3
.
.
.
After using flatten()
:
array([6, 6, 5, ..., 5, 3, 6])
It’s also the solution of errors of this type :
Cannot feed value of shape (100, 1) for Tensor 'input/Y:0', which has shape '(?,)'
I wanted to see a benchmark result of functions mentioned in answers including unutbu’s.
Also want to point out that numpy doc recommend to use arr.reshape(-1)
in case view is preferable. (even though ravel
is tad faster in the following result)
TL;DR:
np.ravel
is the most performant (by very small amount).
Benchmark
Functions:
np.ravel
: returns view, if possiblenp.reshape(-1)
: returns view, if possiblenp.flatten
: returns copynp.flat
: returnsnumpy.flatiter
. similar toiterable
numpy version: ‘1.18.0’
Execution times on different ndarray
sizes
+-------------+----------+-----------+-----------+-------------+
| function | 10x10 | 100x100 | 1000x1000 | 10000x10000 |
+-------------+----------+-----------+-----------+-------------+
| ravel | 0.002073 | 0.002123 | 0.002153 | 0.002077 |
| reshape(-1) | 0.002612 | 0.002635 | 0.002674 | 0.002701 |
| flatten | 0.000810 | 0.007467 | 0.587538 | 107.321913 |
| flat | 0.000337 | 0.000255 | 0.000227 | 0.000216 |
+-------------+----------+-----------+-----------+-------------+
Conclusion
ravel
andreshape(-1)
‘s execution time was consistent and independent from ndarray size.
However,ravel
is tad faster, butreshape
provides flexibility in reshaping size. (maybe that’s why numpy doc recommend to use it instead. Or there could be some cases wherereshape
returns view andravel
doesn’t).
If you are dealing with large size ndarray, usingflatten
can cause a performance issue. Recommend not to use it. Unless you need a copy of the data to do something else.
Used code
import timeit
setup = '''
import numpy as np
nd = np.random.randint(10, size=(10, 10))
'''
timeit.timeit('nd = np.reshape(nd, -1)', setup=setup, number=1000)
timeit.timeit('nd = np.ravel(nd)', setup=setup, number=1000)
timeit.timeit('nd = nd.flatten()', setup=setup, number=1000)
timeit.timeit('nd.flat', setup=setup, number=1000)
Best and fastest among all the suggested solutions: np.reshape()
%timeit img1ary = np.reshape(img2ary,(np.product(img2ary.shape),1))
9.3 µs ± 69.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit img1ary = img2ary.ravel()
157 ns ± 1.32 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit img1ary = img2ary.flatten()
961 ns ± 5.77 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)