Tensorflow softmax does not ignore masking value

Question:

I am reviving this github issue because I believe it is valid and needs to be explained. tf.keras has a masking layer with docs that reads

For each timestep in the input tensor (dimension #1 in the tensor), if
all values in the input tensor at that timestep are equal to
mask_value, then the timestep will be masked (skipped) in all
downstream layers (as long as they support masking).

If any downstream layer does not support masking yet receives such an
input mask, an exception will be raised.


# create padded zeros and change two valid entries.
inputs = np.zeros([1,5])
inputs[0,1] = 0.5
inputs[0,2] = 0.1
inputs = tf.Variable(inputs)
masked_inputs = tf.keras.layers.Masking(mask_value=0.0)(inputs)
with_masking = tf.keras.layers.Softmax()(masked_inputs)
without_masking = tf.keras.layers.Softmax()(inputs)

The two results are virtually identical

with_masking
<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
array([[0.1737954 , 0.28654018, 0.19207363, 0.1737954 , 0.1737954 ]],
      dtype=float32)>
without_masking
<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.1737954 , 0.28654017, 0.19207362, 0.1737954 , 0.1737954 ]])>

Expected behavior

I expected to just take softmax of the valid entries, similiar to

#Assign one large value 
inputs = np.zeros([1,2])
inputs[0,0] = 0.5
inputs[0,1] = 0.1
inputs = tf.Variable(inputs)
without_masking = tf.keras.layers.Softmax()(inputs)

without_masking
<tf.Tensor: shape=(1, 2), dtype=float64, numpy=array([[0.59868766, 0.40131234]])>

padded at the correct positions

with_masking
<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
array([[0 , 0.59868766, 0.40131234, 0, 0 ]],
      dtype=float32)>

To ignore 0’s in a softmax function, we could switch out massively negative numbers?

Related: tensorflow – softmax ignore negative labels (just like caffe)

from tensorflow import __version__
__version__
'2.3.1'
Asked By: bw4sz

||

Answers:

I think this is already explained well in the Github issue you have linked. Underlying problem is that irrespective of whether an array is masked or not, softmax() still operates on 0.0 values and returns a non-zero value as mathematically expected (link).

The only way to get a zero output from a softmax() is to pass a very small float value. If you set the masked values to the minimum possible machine limit for float64, Softmax() of this value will be zero.

To get machine limit on float64 you need tf.float64.min which is equal to -1.7976931348623157e+308. More info about machine limits on this post.

Here is an implementation for your reference on tf.boolean_mask only, and the correct method of using tf.where for creating the mask and passing it to softmax()

import tensorflow as tf

inputs = np.zeros([1,5])
inputs[0,1] = 0.5
inputs[0,2] = 0.1
inputs = tf.Variable(inputs)

#Returns only the elements that are not masked (2,)
with_boolmask = tf.boolean_mask(inputs, inputs!=0)
with_boolmask = tf.keras.layers.Softmax()(with_boolmask)

#Correct way to do it!
masked_inp = tf.where(inputs!=0, inputs, tf.float64.min) #<----
with_where = tf.keras.layers.Softmax()(masked_inp)

print('BOOLEAN MASK (NOT EXPECTED)')
print(with_boolmask)

print('')
print('MASKED INPUT - ')
print(masked_inp)
print('')
print('SOFTMAX OUTPUT')
print(with_where)
BOOLEAN MASK (NOT EXPECTED)
tf.Tensor([0.59868765 0.40131232], shape=(2,), dtype=float32)

MASKED INPUT - 
tf.Tensor(
[[-1.79769313e+308  5.00000000e-001  1.00000000e-001 -1.79769313e+308
  -1.79769313e+308]], shape=(1, 5), dtype=float64)

SOFTMAX OUTPUT
tf.Tensor([[0.         0.59868765 0.40131232 0.         0.        ]], shape=(1, 5), dtype=float32)
Answered By: Akshay Sehgal

Feel free to correct me if I’m wrong. I think we have an easy way to do it.

import tensorflow as tf
import numpy as np
import math

# create padded zeros and change two valid entries.
inputs = tf.constant([0., 0.5, 0.1, 0., 0.])
mask = tf.not_equal(inputs, 0.)
with_masking = tf.keras.layers.Softmax()(inputs, mask=mask)
without_masking = tf.keras.layers.Softmax()(inputs)

print(with_masking)
print(without_masking)

And output is,

tf.Tensor([0.         0.59868765 0.40131232 0.         0.        ], shape=(5,), dtype=float32)
tf.Tensor([0.1737954  0.28654018 0.19207363 0.1737954  0.1737954 ], shape=(5,), dtype=float32)
Answered By: Jianpeng Hou