Computing cosine similarity between two tensors in Keras
Question:
I have been following a tutorial that shows how to make a word2vec
model.
This tutorial uses this piece of code:
similarity = merge([target, context], mode='cos', dot_axes=0)
(no other info was given, but I suppose this comes from keras.layers
)
Now, I’ve researched a bit on the merge
method but I couldn’t find much about it.
From what I understand, it has been replaced by a lot of functions like layers.Add(), layers.Concat()...
.
What should I use? There’s .Dot()
, which has an axis
parameter (which seems to be correct) but no mode
parameter.
What can I use in this case?
Answers:
There are a few things that are unclear from the Keras documentation that I think are crucial to understanding:
For each function in the keras documentation for Merge
, there is a lower case and upper case one defined i.e. add()
and Add()
.
On Github, farizrahman4u
outlines the differences:
Merge is a layer.
Merge takes layers as input
Merge is usually used with Sequential models
merge is a function.
merge takes tensors as input.
merge is a wrapper around Merge.
merge is used in Functional API
Using Merge:
left = Sequential()
left.add(...)
left.add(...)
right = Sequential()
right.add(...)
right.add(...)
model = Sequential()
model.add(Merge([left, right]))
model.add(...)
using merge:
a = Input((10,))
b = Dense(10)(a)
c = Dense(10)(a)
d = merge([b, c])
model = Model(a, d)
To answer your question, since Merge
has been deprecated, we have to define and build a layer ourselves for the cosine similarity
. In general this will involve using those lowercase functions, which we wrap within a Lambda
to create a layer that we can use within a model.
I found a solution here:
from keras import backend as K
def cosine_distance(vests):
x, y = vests
x = K.l2_normalize(x, axis=-1)
y = K.l2_normalize(y, axis=-1)
return -K.mean(x * y, axis=-1, keepdims=True)
def cos_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0],1)
distance = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])
Depending on your data, you may want to remove the L2 normalization. What is important to note about the solution is that it is built using the Keras function api e.g. K.mean()
– I think this is necessary when defining custom layer or even loss functions.
Hope I was clear, this was my first SO answer!
The Dot
layer in Keras now supports built-in Cosine similarity using the normalize = True
parameter.
From the Keras Docs:
keras.layers.Dot(axes, normalize=True)
normalize: Whether to L2-normalize samples along the dot product axis before taking the dot product. If set to True, then the output of the dot product is the cosine proximity between the two samples.
If you alter the last code block of the tutorial as follows, you can see that the (average) loss is decreasing nicely with the Dot solution suggested by SantoshGuptaz7 (comment in the question above):
display_after_epoch = 10000
display_after_epoch_2 = 10 * display_after_epoch
loss_sum = 0
for cnt in range(epochs):
idx = np.random.randint(0, len(labels)-1)
arr_1[0,] = word_target[idx]
arr_2[0,] = word_context[idx]
arr_3[0,] = labels[idx]
loss = model.train_on_batch([arr_1, arr_2], arr_3)
loss_sum += loss
if cnt % display_after_epoch == 0 and cnt != 0:
print("nIteration {}, loss={}".format(cnt, loss_sum / cnt))
loss_sum = 0
if cnt % display_after_epoch_2 == 0:
sim_cb.run_sim()
Maybe this will help you
(I spent a lot of time to make sure that these are the same things)
import tensorflow as tf
with tf.device('/CPU:' + str(0)):
print(tf.losses.CosineSimilarity()([1.0,1.0,1.0,-1.0],[4.0,4.0,4.0,5.0]))
print(tf.keras.layers.dot([tf.Variable([[1.0,1.0,1.0,-1.0]]),tf.Variable([[4.0,4.0,4.0,5.0]])], axes=1, normalize=True))
Output (Pay attention to the sign):
tf.Tensor(-0.40964404, shape=(), dtype=float32)
tf.Tensor([[0.40964404]], shape=(1, 1), dtype=float32)
This should do the trick:
cos = tf.keras.layers.dot( [ tensor_a, tensor_b ], axes = 1, normalize = True )
cos_similarity = (-1.0) * cos
Also, have this in mind:
tf.keras.layers.dot
- 1 if two vectors have angle 0 (min angular distance).
- 0 if two vectors have angle 90 (half angular distance).
- -1 if two vectors have angle 180 (max angular distance).
tf.keras.losses.CosineSimilarity( )
- -1 if two vectors have angle 0 (min angular distance).
- 0 if two vectors have angle 90 (half angular distance).
- 1 if two vectors have angle 180 (max angular distance).
I have been following a tutorial that shows how to make a word2vec
model.
This tutorial uses this piece of code:
similarity = merge([target, context], mode='cos', dot_axes=0)
(no other info was given, but I suppose this comes from keras.layers
)
Now, I’ve researched a bit on the merge
method but I couldn’t find much about it.
From what I understand, it has been replaced by a lot of functions like layers.Add(), layers.Concat()...
.
What should I use? There’s .Dot()
, which has an axis
parameter (which seems to be correct) but no mode
parameter.
What can I use in this case?
There are a few things that are unclear from the Keras documentation that I think are crucial to understanding:
For each function in the keras documentation for Merge
, there is a lower case and upper case one defined i.e. add()
and Add()
.
On Github, farizrahman4u
outlines the differences:
Merge is a layer.
Merge takes layers as input
Merge is usually used with Sequential models
merge is a function.
merge takes tensors as input.
merge is a wrapper around Merge.
merge is used in Functional API
Using Merge:
left = Sequential()
left.add(...)
left.add(...)
right = Sequential()
right.add(...)
right.add(...)
model = Sequential()
model.add(Merge([left, right]))
model.add(...)
using merge:
a = Input((10,))
b = Dense(10)(a)
c = Dense(10)(a)
d = merge([b, c])
model = Model(a, d)
To answer your question, since Merge
has been deprecated, we have to define and build a layer ourselves for the cosine similarity
. In general this will involve using those lowercase functions, which we wrap within a Lambda
to create a layer that we can use within a model.
I found a solution here:
from keras import backend as K
def cosine_distance(vests):
x, y = vests
x = K.l2_normalize(x, axis=-1)
y = K.l2_normalize(y, axis=-1)
return -K.mean(x * y, axis=-1, keepdims=True)
def cos_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0],1)
distance = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])
Depending on your data, you may want to remove the L2 normalization. What is important to note about the solution is that it is built using the Keras function api e.g. K.mean()
– I think this is necessary when defining custom layer or even loss functions.
Hope I was clear, this was my first SO answer!
The Dot
layer in Keras now supports built-in Cosine similarity using the normalize = True
parameter.
From the Keras Docs:
keras.layers.Dot(axes, normalize=True)
normalize: Whether to L2-normalize samples along the dot product axis before taking the dot product. If set to True, then the output of the dot product is the cosine proximity between the two samples.
If you alter the last code block of the tutorial as follows, you can see that the (average) loss is decreasing nicely with the Dot solution suggested by SantoshGuptaz7 (comment in the question above):
display_after_epoch = 10000
display_after_epoch_2 = 10 * display_after_epoch
loss_sum = 0
for cnt in range(epochs):
idx = np.random.randint(0, len(labels)-1)
arr_1[0,] = word_target[idx]
arr_2[0,] = word_context[idx]
arr_3[0,] = labels[idx]
loss = model.train_on_batch([arr_1, arr_2], arr_3)
loss_sum += loss
if cnt % display_after_epoch == 0 and cnt != 0:
print("nIteration {}, loss={}".format(cnt, loss_sum / cnt))
loss_sum = 0
if cnt % display_after_epoch_2 == 0:
sim_cb.run_sim()
Maybe this will help you
(I spent a lot of time to make sure that these are the same things)
import tensorflow as tf
with tf.device('/CPU:' + str(0)):
print(tf.losses.CosineSimilarity()([1.0,1.0,1.0,-1.0],[4.0,4.0,4.0,5.0]))
print(tf.keras.layers.dot([tf.Variable([[1.0,1.0,1.0,-1.0]]),tf.Variable([[4.0,4.0,4.0,5.0]])], axes=1, normalize=True))
Output (Pay attention to the sign):
tf.Tensor(-0.40964404, shape=(), dtype=float32)
tf.Tensor([[0.40964404]], shape=(1, 1), dtype=float32)
This should do the trick:
cos = tf.keras.layers.dot( [ tensor_a, tensor_b ], axes = 1, normalize = True )
cos_similarity = (-1.0) * cos
Also, have this in mind:
tf.keras.layers.dot
- 1 if two vectors have angle 0 (min angular distance).
- 0 if two vectors have angle 90 (half angular distance).
- -1 if two vectors have angle 180 (max angular distance).
tf.keras.losses.CosineSimilarity( )
- -1 if two vectors have angle 0 (min angular distance).
- 0 if two vectors have angle 90 (half angular distance).
- 1 if two vectors have angle 180 (max angular distance).