replace zeroes in numpy array with the median value

Question:

I have a numpy array like this:

foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]

I want to replace all the zeros with the median value of the whole array (where the zero values are not to be included in the calculation of the median)

So far I have this going on:

foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
foo = np.array(foo_array)
foo = np.sort(foo)
print "foo sorted:",foo
#foo sorted: [ 0  0  0  0  0  3  5  8 14 15 16 17 18 19 21 26 27 29 29 31 38 38 40 49 55]
nonzero_values = foo[0::] > 0
nz_values = foo[nonzero_values]
print "nonzero_values?:",nz_values
#nonzero_values?: [ 3  5  8 14 15 16 17 18 19 21 26 27 29 29 31 38 38 40 49 55]
size = np.size(nz_values)
middle = size / 2
print "median is:",nz_values[middle]
#median is: 26

Is there a clever way to achieve this with numpy syntax?

Thank you

Asked By: slashdottir

||

Answers:

foo2 = foo[:]
foo2[foo2 == 0] = nz_values[middle]

Instead of foo2, you could just update foo if you want. Numpy’s smart array syntax can combine a few lines of the code you made. For example, instead of,

nonzero_values = foo[0::] > 0
nz_values = foo[nonzero_values]

You can just do

nz_values = foo[foo > 0]

You can find out more about "fancy indexing" in the documentation.

Answered By: Alex Szatmary

This solution takes advantage of numpy.median:

import numpy as np
foo_array = [38,26,14,55,31,0,15,8,0,0,0,18,40,27,3,19,0,49,29,21,5,38,29,17,16]
foo = np.array(foo_array)
# Compute the median of the non-zero elements
m = np.median(foo[foo > 0])
# Assign the median to the zero elements 
foo[foo == 0] = m

Just a note of caution, the median for your array (with no zeroes) is 23.5 but as written this sticks in 23.

Answered By: bbayles