# How do I access the ith column of a NumPy multidimensional array?

## Question:

Given:

``````test = numpy.array([[1, 2], [3, 4], [5, 6]])
``````

`test[i]` gives the ith row (e.g. `[1, 2]`). How do I access the ith column? (e.g. `[1, 3, 5]`). Also, would this be an expensive operation?

To access column 0:

``````>>> test[:, 0]
array([1, 3, 5])
``````

To access row 0:

``````>>> test[0, :]
array([1, 2])
``````

This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It’s certainly much quicker than accessing each element in a loop.

And if you want to access more than one column at a time you could do:

``````>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
``````
``````>>> test[:,0]
array([1, 3, 5])
``````

this command gives you a row vector, if you just want to loop over it, it’s fine, but if you want to hstack with some other array with dimension 3xN, you will have

``````ValueError: all the input arrays must have same number of dimensions
``````

while

``````>>> test[:,]
array([,
,
])
``````

gives you a column vector, so that you can do concatenate or hstack operation.

e.g.

``````>>> np.hstack((test, test[:,]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
``````
``````>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])

>>> ncol = test.shape
>>> ncol
5L
``````

Then you can select the 2nd – 4th column this way:

``````>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
``````

You could also transpose and return a row:

``````In : test.T
Out: array([1, 3, 5])
``````

To get several and indepent columns, just:

``````> test[:,[0,2]]
``````

you will get colums 0 and 2

Although the question has been answered, let me mention some nuances.

Let’s say you are interested in the first column of the array

``````arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
``````

As you already know from other answers, to get it in the form of "row vector" (array of shape `(3,)`), you use slicing:

``````arr_col1_view = arr[:, 1]         # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy()  # creates a copy of the 1st column of the arr
``````

To check if an array is a view or a copy of another array you can do the following:

``````arr_col1_view.base is arr  # True
arr_col1_copy.base is arr  # False
``````

see ndarray.base.

Besides the obvious difference between the two (modifying `arr_col1_view` will affect the `arr`), the number of byte-steps for traversing each of them is different:

``````arr_col1_view.strides  # 8 bytes
arr_col1_copy.strides  # 4 bytes
``````

Why is this important? Imagine that you have a very big array `A` instead of the `arr`:

``````A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
``````

and you want to compute the sum of all the elements of the first column, i.e. `A_col1_view.sum()` or `A_col1_copy.sum()`. Using the copied version is much faster:

``````%timeit A_col1_view.sum()  # ~248 µs
%timeit A_col1_copy.sum()  # ~12.8 µs
``````

This is due to the different number of strides mentioned before:

``````A_col1_view.strides  # 40000 bytes
A_col1_copy.strides  # 4 bytes
``````

Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the `A_col1_copy`). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.

In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major (‘F’) order instead of the row-major (‘C’) order (which is the default), and then do the slicing as before to get a column without copying it:

``````A = np.asfortranarray(A)   # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides     # 4 bytes

%timeit A_col1_view.sum()  # ~12.6 µs vs ~248 µs
``````

Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.

Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.

``````A[:, 1].strides    # 40000 bytes
A.T[1, :].strides  # 40000 bytes
``````

This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.

``````test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b]  # you can provide index in place of a and b
``````
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.