Numpy matrix binarization using only one expression
Question:
I am looking for a way to binarize numpy N-d array based on the threshold using only one expression. So I have something like this:
np.random.seed(0)
np.set_printoptions(precision=3)
a = np.random.rand(4, 4)
threshold, upper, lower = 0.5, 1, 0
a is now:
array([[ 0.02 , 0.833, 0.778, 0.87 ],
[ 0.979, 0.799, 0.461, 0.781],
[ 0.118, 0.64 , 0.143, 0.945],
[ 0.522, 0.415, 0.265, 0.774]])
Now I can fire these 2 expressions:
a[a>threshold] = upper
a[a<=threshold] = lower
and achieve what I want:
array([[ 0., 1., 1., 1.],
[ 1., 1., 0., 1.],
[ 0., 1., 0., 1.],
[ 1., 0., 0., 1.]])
But is there a way to do this with just one expression?
Answers:
We may consider np.where
:
np.where(a>threshold, upper, lower)
Out[6]:
array([[0, 1, 1, 1],
[1, 1, 0, 1],
[0, 1, 0, 1],
[1, 0, 0, 1]])
You can write expression directly, this will return a boolean array, and it can be used simply as an 1-byte unsigned integer (“uint8”) array for further calculations:
print a > 0.5
output
[[False True True True]
[ True True False True]
[False True False True]
[ True False False True]]
In one line and with custom upper/lower values you can write so for example:
upper = 10
lower = 3
treshold = 0.5
print lower + (a>treshold) * (upper-lower)
Numpy treats every 1d array as a vector, 2d array as sequence of vectors (matrix) and 3d+ array as a generic tensor. This means when we perform operations, we are performing vector math. So you can just do:
>>> a = (a > 0.5).astype(np.int_)
For example:
>>> np.random.seed(0)
>>> np.set_printoptions(precision=3)
>>> a = np.random.rand(4, 4)
>>> a
>>> array([[ 0.549, 0.715, 0.603, 0.545],
[ 0.424, 0.646, 0.438, 0.892],
[ 0.964, 0.383, 0.792, 0.529],
[ 0.568, 0.926, 0.071, 0.087]])
>>> a = (a > 0.5).astype(np.int_) # Where the numpy magic happens.
>>> array([[1, 1, 1, 1],
[0, 1, 0, 1],
[1, 0, 1, 1],
[1, 1, 0, 0]])
Whats going on here is that you are automatically iterating through every element of every row in the 4×4 matrix and applying a boolean comparison to each element.
If > 0.5 return True, else return False.
Then by calling the .astype method and passing np.int_ as the argument, you’re telling numpy to replace all boolean values with their integer representation, in effect binarizing the matrix based on your comparison value.
A shorter method is to simply multiply the boolean matrix from the condition by 1 or 1.0, depending on the type you want.
>>> a = np.random.rand(4,4)
>>> a
array([[ 0.63227032, 0.18262573, 0.21241511, 0.95181594],
[ 0.79215808, 0.63868395, 0.41706148, 0.9153959 ],
[ 0.41812268, 0.70905987, 0.54946947, 0.51690887],
[ 0.83693151, 0.10929998, 0.19219377, 0.82919761]])
>>> (a>0.5)*1
array([[1, 0, 0, 1],
[1, 1, 0, 1],
[0, 1, 1, 1],
[1, 0, 0, 1]])
>>> (a>0.5)*1.0
array([[ 1., 0., 0., 1.],
[ 1., 1., 0., 1.],
[ 0., 1., 1., 1.],
[ 1., 0., 0., 1.]])
I am looking for a way to binarize numpy N-d array based on the threshold using only one expression. So I have something like this:
np.random.seed(0)
np.set_printoptions(precision=3)
a = np.random.rand(4, 4)
threshold, upper, lower = 0.5, 1, 0
a is now:
array([[ 0.02 , 0.833, 0.778, 0.87 ],
[ 0.979, 0.799, 0.461, 0.781],
[ 0.118, 0.64 , 0.143, 0.945],
[ 0.522, 0.415, 0.265, 0.774]])
Now I can fire these 2 expressions:
a[a>threshold] = upper
a[a<=threshold] = lower
and achieve what I want:
array([[ 0., 1., 1., 1.],
[ 1., 1., 0., 1.],
[ 0., 1., 0., 1.],
[ 1., 0., 0., 1.]])
But is there a way to do this with just one expression?
We may consider np.where
:
np.where(a>threshold, upper, lower)
Out[6]:
array([[0, 1, 1, 1],
[1, 1, 0, 1],
[0, 1, 0, 1],
[1, 0, 0, 1]])
You can write expression directly, this will return a boolean array, and it can be used simply as an 1-byte unsigned integer (“uint8”) array for further calculations:
print a > 0.5
output
[[False True True True]
[ True True False True]
[False True False True]
[ True False False True]]
In one line and with custom upper/lower values you can write so for example:
upper = 10
lower = 3
treshold = 0.5
print lower + (a>treshold) * (upper-lower)
Numpy treats every 1d array as a vector, 2d array as sequence of vectors (matrix) and 3d+ array as a generic tensor. This means when we perform operations, we are performing vector math. So you can just do:
>>> a = (a > 0.5).astype(np.int_)
For example:
>>> np.random.seed(0)
>>> np.set_printoptions(precision=3)
>>> a = np.random.rand(4, 4)
>>> a
>>> array([[ 0.549, 0.715, 0.603, 0.545],
[ 0.424, 0.646, 0.438, 0.892],
[ 0.964, 0.383, 0.792, 0.529],
[ 0.568, 0.926, 0.071, 0.087]])
>>> a = (a > 0.5).astype(np.int_) # Where the numpy magic happens.
>>> array([[1, 1, 1, 1],
[0, 1, 0, 1],
[1, 0, 1, 1],
[1, 1, 0, 0]])
Whats going on here is that you are automatically iterating through every element of every row in the 4×4 matrix and applying a boolean comparison to each element.
If > 0.5 return True, else return False.
Then by calling the .astype method and passing np.int_ as the argument, you’re telling numpy to replace all boolean values with their integer representation, in effect binarizing the matrix based on your comparison value.
A shorter method is to simply multiply the boolean matrix from the condition by 1 or 1.0, depending on the type you want.
>>> a = np.random.rand(4,4)
>>> a
array([[ 0.63227032, 0.18262573, 0.21241511, 0.95181594],
[ 0.79215808, 0.63868395, 0.41706148, 0.9153959 ],
[ 0.41812268, 0.70905987, 0.54946947, 0.51690887],
[ 0.83693151, 0.10929998, 0.19219377, 0.82919761]])
>>> (a>0.5)*1
array([[1, 0, 0, 1],
[1, 1, 0, 1],
[0, 1, 1, 1],
[1, 0, 0, 1]])
>>> (a>0.5)*1.0
array([[ 1., 0., 0., 1.],
[ 1., 1., 0., 1.],
[ 0., 1., 1., 1.],
[ 1., 0., 0., 1.]])