Confusion in array operation in numpy

Question:

I generally use MATLAB and Octave, and i recently switching to python numpy.
In numpy when I define an array like this

>>> a = np.array([[2,3],[4,5]])

it works great and size of the array is

>>> a.shape
(2, 2)

which is also same as MATLAB
But when i extract the first entire column and see the size

>>> b = a[:,0]
>>> b.shape
(2,)

I get size (2,), what is this? I expect the size to be (2,1). Perhaps i misunderstood the basic concept. Can anyone make me clear about this??

Asked By: user1481317

||

Answers:

Typing help(np.shape) gives some insight in to what is going on here. For starters, you can get the output you expect by typing:

b = np.array([a[:,0]])

Basically numpy defines things a little differently than MATLAB. In the numpy environment, a vector only has one dimension, and an array is a vector of vectors, so it can have more. In your first example, your array is a vector of two vectors, i.e.:

a = np.array([[vec1], [vec2]])

So a has two dimensions, and in your example the number of elements in both dimensions is the same, 2. Your array is therefore 2 by 2. When you take a slice out of this, you are reducing the number of dimensions that you have by one. In other words, you are taking a vector out of your array, and that vector only has one dimension, which also has 2 elements, but that’s it. Your vector is now 2 by _. There is nothing in the second spot because the vector is not defined there.

You could think of it in terms of spaces too. Your first array is in the space R^(2x2) and your second vector is in the space R^(2). This means that the array is defined on a different (and bigger) space than the vector.

That was a lot to basically say that you took a slice out of your array, and unlike MATLAB, numpy does not represent vectors (1 dimensional) in the same way as it does arrays (2 or more dimensions).

Answered By: Engineero

A 1D numpy array* is literally 1D – it has no size in any second dimension, whereas in MATLAB, a ‘1D’ array is actually 2D, with a size of 1 in its second dimension.

If you want your array to have size 1 in its second dimension you can use its .reshape() method:

a = np.zeros(5,)
print(a.shape)
# (5,)

# explicitly reshape to (5, 1)
print(a.reshape(5, 1).shape)
# (5, 1)

# or use -1 in the first dimension, so that its size in that dimension is 
# inferred from its total length
print(a.reshape(-1, 1).shape)
# (5, 1)

Edit

As Akavall pointed out, I should also mention np.newaxis as another method for adding a new axis to an array. Although I personally find it a bit less intuitive, one advantage of np.newaxis over .reshape() is that it allows you to add multiple new axes in an arbitrary order without explicitly specifying the shape of the output array, which is not possible with the .reshape(-1, ...) trick:

a = np.zeros((3, 4, 5))
print(a[np.newaxis, :, np.newaxis, ..., np.newaxis].shape)
# (1, 3, 1, 4, 5, 1)

np.newaxis is just an alias of None, so you could do the same thing a bit more compactly using a[None, :, None, ..., None].


* An np.matrix, on the other hand, is always 2D, and will give you the indexing behavior you are familiar with from MATLAB:

a = np.matrix([[2, 3], [4, 5]])
print(a[:, 0].shape)
# (2, 1)

For more info on the differences between arrays and matrices, see here.

Answered By: ali_m
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.