# Extracting specific columns in numpy array

## Question:

This is an easy question but say I have an MxN matrix. All I want to do is extract specific columns and store them in another numpy array but I get invalid syntax errors.

Here is the code:

```
extractedData = data[[:,1],[:,9]].
```

It seems like the above line should suffice but I guess not. I looked around but couldn’t find anything syntax wise regarding this specific scenario.

## Answers:

I assume you wanted columns `1`

and `9`

?

To select multiple columns at once, use

```
X = data[:, [1, 9]]
```

To select one at a time, use

```
x, y = data[:, 1], data[:, 9]
```

With names:

```
data[:, ['Column Name1','Column Name2']]
```

You can get the names from `data.dtype.names`

…

Assuming you want to get columns 1 and 9 with that code snippet, it should be:

```
extractedData = data[:,[1,9]]
```

if you want to extract only some columns:

```
idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]
```

if you want to exclude specific columns:

```
idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]
```

you can also use extractedData=data([:,1],[:,9])

You can use the following:

```
extracted_data = data.ix[:,['Column1','Column2']]
```

One thing I would like to point out is, if the number of columns you want to extract is 1 the resulting matrix **would not be a Mx1 Matrix** as you might expect but instead an array containing the elements of the column you extracted.

To convert it to Matrix the *reshape(M,1)* method should be used on the resulting array.

One more thing you should pay attention to when selecting columns from N-D array using a list like this:

```
data[:,:,[1,9]]
```

If you are removing a dimension (by selecting only one row, for example), **the resulting array will be (for some reason) permuted**. So:

```
print data.shape # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape # gives [2,20] instead of [20,2]!!
```

Just:

```
>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355 , 0.33025395],
[0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
[0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
[0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
[0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
[0.67463754, 0.43158254],
[0.86431513, 0.12153138],
[0.66139215, 0.08400288],
[0.76385882, 0.11002419]])
```

The columns need not to be in order:

```
>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355 ],
[0.43158254, 0.67463754, 0.95367876],
[0.12153138, 0.86431513, 0.73006437],
[0.08400288, 0.66139215, 0.56769924],
[0.11002419, 0.76385882, 0.2509888 ]])
```

I think the solution here is not working with an update of the python version anymore, one way to do it with a new python function for it is:

```
extracted_data = data[['Column Name1','Column Name2']].to_numpy()
```

which gives you the desired outcome.

The documentation you can find here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

I could not edit the chosen answer so I’m adding an answer to clarify that using an integer to index seems to be returning a view (not a copy) while using a list returns a copy

```
>>> x = np.zeros(shape=[2, 3])
>>> y = x[:, [0, 1]]
>>> z1, z2 = x[:, 0], x[:, 1]
>>> y[0, 0] = 1
>>> print(y)
[[1. 0.]
[0. 0.]]
>>> print(x)
[[0. 0. 0.]
[0. 0. 0.]]
>>> z1[0] = 2
>>> print(z1)
[2. 0.]
>>> print(x)
[[2. 0. 0.]
[0. 0. 0.]]
```