How to sum up values according to indices in a different vector using keras / tensorflow?

Question:

I’m new here, and I have a question to ask regarding indexing of tensors in Keras / Tensorflow:

I have a vector of length N, which contains indices of words in a vocabulary (indices may repeat). This vector represents a sentence, for example (40, 25, 99, 26, 34, 99, 100, 100...)
I also have another vector, or actually a matrix (since it’s a batch of examples), of the same length N, where each word in the original vector is assigned a weight W_i. I want to sum up the weights for a specific word across the whole sentence so that I can get a map from word index to the sum of weights for that word in the sentence, and I want to do it in a vectorized way.
For example, assuming the sentence is (1, 2, 3, 4, 5, 3), and the weights are (0, 1, 0.5, 0.1, 0.6, 0.5), I want the result to be some mapping:

1->0
2->1
3->1
4->0.1
5->0.6

How can I achieve something like that without the need to iterate through each element? I was thinking something along the direction of a sparse tensor (since the possible vocabulary is very large), but I don’t know how to implement it efficiently.
Can anyone help?
I basically want to implement a pointer-generator network and this part is required when calculating the probabilty of copying an input word rather than generating one.

Asked By: OmriP

||

Answers:

You need tf.bincount(), which counts the number of occurrences of each value in an integer array. An example:

import tensorflow as tf
import numpy as np

indices_tf = tf.placeholder(shape=(None,None),dtype=tf.int32)
weights_tf = tf.placeholder(shape=(None,None),dtype=tf.float32)

# The returned index counts from 0
result = tf.bincount(indices_tf,weights_tf)

indices_data = np.array([1, 2, 3, 4, 5, 3])
weights_data = np.array([0, 1, 0.5, 0.1, 0.6, 0.5])

with tf.Session() as sess:
    print(sess.run(result, feed_dict={indices_tf:[indices_data],weights_tf:[weights_data]}))
    print(sess.run(result, feed_dict={indices_tf: [indices_data]*2, weights_tf: [weights_data]*2}))

# print
[0.  0.  1.  1.  0.1 0.6]
[0.  0.  2.  2.  0.2 1.2]
Answered By: giser_yugang

maybe open3d.ml.tf.ops.reduce_subarrays_sum is your answer.

Answered By: dylan