assign index dependant value to each index in numpy array

Question:

I want to center multi-dimensional data in a n x m matrix (<class 'numpy.matrixlib.defmatrix.matrix'>), let’s say X . I defined a new array ones(645), lets say centVector to produce the mean for every row in matrix X. And now I want to iterate every row in X, compute the mean and assign this value to the corresponding index in centVector. Isn’t this possible in a single row in scipy/numpy? I am not used to this language and think about something like:

centVector = ones(645)
for key, val in X:
    centVector[key] = centVector[key] * (val.sum/val.size)

Afterwards I just need to subtract the mean in every Row:

X = X - centVector

How can I simplify this?
EDIT: And besides, the above code is not actually working – for a key-value loop I need something like enumerate(X). And I am not sure if X - centVector is returning the proper solution.

Asked By: Milla Well

||

Answers:

First, some example data:

>>> import numpy as np
>>> X = np.matrix(np.arange(25).reshape((5,5)))
>>> print X
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

numpy conveniently has a mean function. By default however, it’ll give you the mean over all the values in the array. Since you want the mean of each row, you need to specify the axis of the operation:

>>> np.mean(X, axis=1)
matrix([[  2.],
        [  7.],
        [ 12.],
        [ 17.],
        [ 22.]])

Note that axis=1 says: find the mean along the columns (for each row), where 0 = rows and 1 = columns (and so on). Now, you can subtract this mean from your X, as you did originally.

Unsolicited advice

Usually, it’s best to avoid the matrix class (see docs). If you remove the np.matrix call from the example data, then you get a normal numpy array.

Unfortunately, in this particular case, using an array slightly complicates things because np.mean will return a 1D array:

>>> X = np.arange(25).reshape((5,5))
>>> r_means = np.mean(X, axis=1)
>>> print r_means
[  2.   7.  12.  17.  22.]

If you try to subtract this from X, r_means gets broadcast to a row vector, instead of a column vector:

>>> X - r_means
array([[ -2.,  -6., -10., -14., -18.],
       [  3.,  -1.,  -5.,  -9., -13.],
       [  8.,   4.,   0.,  -4.,  -8.],
       [ 13.,   9.,   5.,   1.,  -3.],
       [ 18.,  14.,  10.,   6.,   2.]])

So, you’ll have to reshape the 1D array into an N x 1 column vector:

>>> X - r_means.reshape((-1, 1))
array([[-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.],
       [-2., -1.,  0.,  1.,  2.]])

The -1 passed to reshape tells numpy to figure out this dimension based on the original array shape and the rest of the dimensions of the new array. Alternatively, you could have reshaped the array using r_means[:, np.newaxis].

Answered By: Tony S Yu
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.