Why is StringLookup from producing an extra label?

Question:

From TF documentation:
"one_hot": Encodes each individual element in the input into an array the same size as the vocabulary.

alphabet = set("abcdefghijklmnopqrstuvwxyz")
one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), output_mode='one_hot')
print(len(alphabet)) #26
print(one_hot_encoder("a").shape) #(27,)

As far as I understand it it should encode to a 26 shaped tensor. Why does it encode to a 27 shaped one? Should there be an extra label to represent "no class"?

Asked By: Oskar Zdrojewski

||

Answers:

The position 0 is reserved for the OOV token (out of vocabulary). If you don’t want that, you can set num_oov_indices to zero:

one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), num_oov_indices=0, output_mode='one_hot')
Answered By: AndrzejO
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.