tensorflow distribute integers according to probabilities

Question:

I would like to distribute an integer for example 20, into four parts following the probability for each part_p=[0.02,0.5,0.3,0.18]

The corresponding python code is:

frequency=np.random.choice([1,2,3,4],20,p=[0.02,0.5,0.3,0.18])
from collections import Counter
np.fromiter(Counter(frequency).values(), dtype=np.float32)

# Out[86]:
# array([8., 8., 4.], dtype=float32)

However, I have over 1e8~ many parts and the number is not 20 but some 1e10.
So python is really slow.
for example

frequency=np.random.choice([i for i in range (10**7)],16**10,p=[0.0000001 for i in range(10**7)])
from collections import Counter
r=np.fromiter(Counter(frequency).values(), dtype=np.float32)

Now it simply yields MemoryError:

I think tensorflow gpu is able to conquer this issue, since the output result is only of size 10**7.
Does anyone know how to do this?

Asked By: ZHANG Juenjie

||

Answers:

There are a few issues here to think of.

If you run the code on a GPU, it will never work because GPUs are not made for storage but rather fast computation so the space on the GPU is less than a CPU. However, this code may produce a memory error on a CPU too, as it did on my machine. So we first try to overcome that.

Overcoming the MemoryError on CPU:

The line producing the MemoryError is line 1 itself:

    In [1]: frequency = np.random.choice([i for i in range (10**7)],16**10,p=[0.0000
   ...: 001 for i in range(10**7)])
   ...: 
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)

The reason for this is that the output of line 1 is not of size 10**7 but 16**10. Since this is what is causing the MemoryError, the goal should be never to create a list of that size.

To do this, we reduce the size of the sample by a factor and loop over the block factor number of times so that it is storable. On my machine, a factor of 1000000 does the trick. Once we have created the sample, we use Counter to turn it into a dictionary of frequencies. The advantage is that we know that the dictionary of frequencies, when converted to a list or numpy array, will never exceed the size of 10**7, which does not give a memory error.

As some of the elements might not be in the sampled array each time, instead of converting the Counter dictionary into a list directly, we will update this dictionary using the dictionary in the previous iteration to preserve frequencies of the specific elements.

Once the whole loop is done, we convert the created dictionary to a list. I have added a progressbar so as to track the progress since the computation might take a lot of time. Also, you don’t need to add the parameter p to the np.random.choice() function in your specific case as the distribution is uniform anyway.

import numpy as np
import tensorflow as tf

from click import progressbar
from collections import Counter

def large_uniform_sample_frequencies(factor=1000000, total_elements=10**7, sample_size=16**10):
    # Initialising progressbar
    bar = range(factor)

    # Initialise an empty dictionary which 
    # will be updated in each iteration
    counter_dict = {}

    for iteration in bar:
        # Generate a random sample of size (16 ** 10) / factor
        frequency = np.random.choice([i for i in range (total_elements)],
            sample_size / factor)

        # Update the frequency dictionary
        new_counter = Counter(frequency)
        counter_dict.update(new_counter)

    return np.fromiter(counter_dict.values(), dtype=np.float32)

Using tensorflow-gpu:

As you have mentioned tensorflow-gpu I can assume you either want to get rid of the MemoryError using tensorflow-gpu or run this in conjunction with tensorflow-gpu while using a GPU.

To solve the MemoryError, you may try the tf.multinomial() function to the same effect as np.random.choice() as shown here, but it is unlikely that it will help overcome the problem, which is storing data of a certain size and not performing some alternate computation.

If you want to run this as part of training some model for instance, you can use Distributed Tensorflow to place this part of the computation graph on the CPU as a PS Task by using the code given above. Here is the final code for that:

# Mention the devices for PS and worker tasks
ps_dev = '/cpu:0'
worker_dev = '/gpu:0'

# Toggle True to place computation on CPU 
# and False to place it on the least loaded GPU
is_ps_task = True

# Set device for a PS task
if (is_ps_task):
    device_setter = tf.train.replica_device_setter(worker_device=worker_dev,
        ps_device=ps_dev, 
        ps_tasks=1)

# Allocate the computation to CPU
with tf.device(device_setter):
    freqs = large_uniform_sample_frequencies()
Answered By: Vedang Waradpande
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.