How to sum up values according to indices in a different vector using keras / tensorflow?
Question:
I’m new here, and I have a question to ask regarding indexing of tensors in Keras / Tensorflow:
I have a vector of length N
, which contains indices of words in a vocabulary (indices may repeat). This vector represents a sentence, for example (40, 25, 99, 26, 34, 99, 100, 100...)
I also have another vector, or actually a matrix (since it’s a batch of examples), of the same length N
, where each word in the original vector is assigned a weight W_i
. I want to sum up the weights for a specific word across the whole sentence so that I can get a map from word index to the sum of weights for that word in the sentence, and I want to do it in a vectorized way.
For example, assuming the sentence is (1, 2, 3, 4, 5, 3)
, and the weights are (0, 1, 0.5, 0.1, 0.6, 0.5)
, I want the result to be some mapping:
1->0
2->1
3->1
4->0.1
5->0.6
How can I achieve something like that without the need to iterate through each element? I was thinking something along the direction of a sparse tensor (since the possible vocabulary is very large), but I don’t know how to implement it efficiently.
Can anyone help?
I basically want to implement a pointer-generator network and this part is required when calculating the probabilty of copying an input word rather than generating one.
Answers:
You need tf.bincount()
, which counts the number of occurrences of each value in an integer array. An example:
import tensorflow as tf
import numpy as np
indices_tf = tf.placeholder(shape=(None,None),dtype=tf.int32)
weights_tf = tf.placeholder(shape=(None,None),dtype=tf.float32)
# The returned index counts from 0
result = tf.bincount(indices_tf,weights_tf)
indices_data = np.array([1, 2, 3, 4, 5, 3])
weights_data = np.array([0, 1, 0.5, 0.1, 0.6, 0.5])
with tf.Session() as sess:
print(sess.run(result, feed_dict={indices_tf:[indices_data],weights_tf:[weights_data]}))
print(sess.run(result, feed_dict={indices_tf: [indices_data]*2, weights_tf: [weights_data]*2}))
# print
[0. 0. 1. 1. 0.1 0.6]
[0. 0. 2. 2. 0.2 1.2]
maybe open3d.ml.tf.ops.reduce_subarrays_sum
is your answer.
I’m new here, and I have a question to ask regarding indexing of tensors in Keras / Tensorflow:
I have a vector of length N
, which contains indices of words in a vocabulary (indices may repeat). This vector represents a sentence, for example (40, 25, 99, 26, 34, 99, 100, 100...)
I also have another vector, or actually a matrix (since it’s a batch of examples), of the same length N
, where each word in the original vector is assigned a weight W_i
. I want to sum up the weights for a specific word across the whole sentence so that I can get a map from word index to the sum of weights for that word in the sentence, and I want to do it in a vectorized way.
For example, assuming the sentence is (1, 2, 3, 4, 5, 3)
, and the weights are (0, 1, 0.5, 0.1, 0.6, 0.5)
, I want the result to be some mapping:
1->0
2->1
3->1
4->0.1
5->0.6
How can I achieve something like that without the need to iterate through each element? I was thinking something along the direction of a sparse tensor (since the possible vocabulary is very large), but I don’t know how to implement it efficiently.
Can anyone help?
I basically want to implement a pointer-generator network and this part is required when calculating the probabilty of copying an input word rather than generating one.
You need tf.bincount()
, which counts the number of occurrences of each value in an integer array. An example:
import tensorflow as tf
import numpy as np
indices_tf = tf.placeholder(shape=(None,None),dtype=tf.int32)
weights_tf = tf.placeholder(shape=(None,None),dtype=tf.float32)
# The returned index counts from 0
result = tf.bincount(indices_tf,weights_tf)
indices_data = np.array([1, 2, 3, 4, 5, 3])
weights_data = np.array([0, 1, 0.5, 0.1, 0.6, 0.5])
with tf.Session() as sess:
print(sess.run(result, feed_dict={indices_tf:[indices_data],weights_tf:[weights_data]}))
print(sess.run(result, feed_dict={indices_tf: [indices_data]*2, weights_tf: [weights_data]*2}))
# print
[0. 0. 1. 1. 0.1 0.6]
[0. 0. 2. 2. 0.2 1.2]
maybe open3d.ml.tf.ops.reduce_subarrays_sum
is your answer.