Generating a list of random numbers, summing to 1

Question:

  • This question is not a duplicate of Getting N random numbers whose sum is M because:
    1. Most answers there are about theory, not a specific coding solution in python to answer this question
    2. The accepted answer here is 5 years older than the one answer in the duplicate that answers this question.
    3. The duplicate accepted answer does not answer this question

How would I make a list of N (say 100) random numbers, so that their sum is 1?

I can make a list of random numbers with

r = [ran.random() for i in range(1,100)]

How would I modify this so that the list sums to 1 (this is for a probability simulation).

Asked By: Tom Kealy

||

Answers:

You could easily do with:

r.append(1 - sum(r))
Answered By: Paul Evans

The best way to do this is to simply make a list of as many numbers as you wish, then divide them all by the sum. They are totally random this way.

r = [ran.random() for i in range(1,100)]
s = sum(r)
r = [ i/s for i in r ]

or, as suggested by @TomKealy, keep the sum and creation in one loop:

rs = []
s = 0
for i in range(100):
    r = ran.random()
    s += r
    rs.append(r)

For the fastest performance, use numpy:

import numpy as np
a = np.random.random(100)
a /= a.sum()

And you can give the random numbers any distribution you want, for a probability distribution:

a = np.random.normal(size=100)
a /= a.sum()

—- Timing —-

In [52]: %%timeit
    ...: r = [ran.random() for i in range(1,100)]
    ...: s = sum(r)
    ...: r = [ i/s for i in r ]
   ....: 
1000 loops, best of 3: 231 µs per loop

In [53]: %%timeit
   ....: rs = []
   ....: s = 0
   ....: for i in range(100):
   ....:     r = ran.random()
   ....:     s += r
   ....:     rs.append(r)
   ....: 
10000 loops, best of 3: 39.9 µs per loop

In [54]: %%timeit
   ....: a = np.random.random(100)
   ....: a /= a.sum()
   ....: 
10000 loops, best of 3: 21.8 µs per loop
Answered By: askewchan

generate 100 random numbers doesn’t matter what range.
sum the numbers generated, divide each individual by the total.

Answered By: guessing

Dividing each number by the total may not give you the distribution you want. For example, with two numbers, the pair x,y = random.random(), random.random() picks a point uniformly on the square 0<=x<1, 0<=y<1. Dividing by the sum “projects” that point (x,y) onto the line x+y=1 along the line from (x,y) to the origin. Points near (0.5,0.5) will be much more likely than points near (0.1,0.9).

For two variables, then, x = random.random(), y=1-x gives a uniform distribution along the geometrical line segment.

With 3 variables, you are picking a random point in a cube and projecting (radially, through the origin), but points near the center of the triangle will be more likely than points near the vertices. The resulting points are on a triangle in the x+y+z plane. If you need unbiased choice of points in that triangle, scaling is no good.

The problem gets complicated in n-dimensions, but you can get a low-precision (but high accuracy, for all you laboratory science fans!) estimate by picking uniformly from the set of all n-tuples of non-negative integers adding up to N, and then dividing each of them by N.

I recently came up with an algorithm to do that for modest-sized n, N. It should work for n=100 and N = 1,000,000 to give you 6-digit randoms. See my answer at:

Create constrained random numbers?

Answered By: Mike Housky

Create a list consisting of 0 and 1, then add 99 random numbers. Sort the list. Successive differences will be the lengths of intervals that add up to 1.

I’m not fluent in Python, so forgive me if there’s a more Pythonic way of doing this. I hope the intent is clear though:

import random

values = [0.0, 1.0]
for i in range(99):
    values.append(random.random())
values.sort()
results = []
for i in range(1,101):
    results.append(values[i] - values[i-1])
print results

Here’s an updated implementation in Python 3:

import random

def sum_to_one(n):
    values = [0.0, 1.0] + [random.random() for _ in range(n - 1)]
    values.sort()
    return [values[i+1] - values[i] for i in range(n)]

print(sum_to_one(100))
Answered By: pjs

The simplest solution is indeed to take N random values and divide by the sum.

A more generic solution is to use the Dirichlet distribution
which is available in numpy.

By changing the parameters of the distribution you can change the "randomness" of individual numbers

>>> import numpy as np, numpy.random
>>> print np.random.dirichlet(np.ones(10),size=1)
[[ 0.01779975  0.14165316  0.01029262  0.168136    0.03061161  0.09046587
   0.19987289  0.13398581  0.03119906  0.17598322]]

>>> print np.random.dirichlet(np.ones(10)/1000.,size=1)
[[  2.63435230e-115   4.31961290e-209   1.41369771e-212   1.42417285e-188
    0.00000000e+000   5.79841280e-143   0.00000000e+000   9.85329725e-005
    9.99901467e-001   8.37460207e-246]]

>>> print np.random.dirichlet(np.ones(10)*1000.,size=1)
[[ 0.09967689  0.10151585  0.10077575  0.09875282  0.09935606  0.10093678
   0.09517132  0.09891358  0.10206595  0.10283501]]

Depending on the main parameter the Dirichlet distribution will either give vectors where all the values are close to 1./N where N is the length of the vector, or give vectors where most of the values of the vectors will be ~0 , and there will be a single 1, or give something in between those possibilities.

EDIT (5 years after the original answer): Another useful fact about the Dirichlet distribution is that you naturally get it, if you generate a Gamma-distributed set of random variables and then divide them by their sum.

Answered By: sega_sai

In the spirit of “divide each element in list by sum of list”, this definition will create a list of random numbers of length = PARTS, sum = TOTAL, with each element rounded to PLACES (or None):

import random
import time

PARTS       = 5
TOTAL       = 10
PLACES      = 3

def random_sum_split(parts, total, places):

    a = []
    for n in range(parts):
        a.append(random.random())
    b = sum(a)
    c = [x/b for x in a]    
    d = sum(c)
    e = c
    if places != None:
        e = [round(x*total, places) for x in c]
    f = e[-(parts-1):]
    g = total - sum(f)
    if places != None:
        g = round(g, places)
    f.insert(0, g)

    log(a)
    log(b)
    log(c)
    log(d)
    log(e)
    log(f)
    log(g)

    return f   

def tick():

    if info.tick == 1:

        start = time.time()

        alpha = random_sum_split(PARTS, TOTAL, PLACES)

        log('********************')
        log('***** RESULTS ******')
        log('alpha: %s' % alpha)
        log('total: %.7f' % sum(alpha))
        log('parts: %s' % PARTS)
        log('places: %s' % PLACES)

        end = time.time()  

        log('elapsed: %.7f' % (end-start))

result:

Waiting...
Saved successfully.
[2014-06-13 00:01:00] [0.33561018369775897, 0.4904215932650632, 0.20264927800402832, 0.118862130636748, 0.03107818050878819]
[2014-06-13 00:01:00] 1.17862136611
[2014-06-13 00:01:00] [0.28474809073311597, 0.41609766067850096, 0.17193755673414868, 0.10084844382959707, 0.02636824802463724]
[2014-06-13 00:01:00] 1.0
[2014-06-13 00:01:00] [2.847, 4.161, 1.719, 1.008, 0.264]
[2014-06-13 00:01:00] [2.848, 4.161, 1.719, 1.008, 0.264]
[2014-06-13 00:01:00] 2.848
[2014-06-13 00:01:00] ********************
[2014-06-13 00:01:00] ***** RESULTS ******
[2014-06-13 00:01:00] alpha: [2.848, 4.161, 1.719, 1.008, 0.264]
[2014-06-13 00:01:00] total: 10.0000000
[2014-06-13 00:01:00] parts: 5
[2014-06-13 00:01:00] places: 3
[2014-06-13 00:01:00] elapsed: 0.0054131
Answered By: litepresence

In the spirit of pjs’s method:

a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]

If you want them rounded to decimal places:

if places == None:
    return b
else:    
    b.pop()
    c = [round(x, places) for x in b]  
    c.append(round(total-sum(c), places))
    return c
Answered By: litepresence

In addition to @pjs’s solution we can define a function with two parameters as well.

import numpy as np

def sum_to_x(n, x):
    values = [0.0, x] + list(np.random.uniform(low=0.0,high=x,size=n-1))
    values.sort()
    return [values[i+1] - values[i] for i in range(n)]

sum_to_x(10, 0.6)
Out: 
[0.079058655684546,
 0.04168649034779022,
 0.09897491411670578,
 0.065152293196646,
 0.000544800901222664,
 0.12329662037166766,
 0.09562168167787738,
 0.01641359261155284,
 0.058273232428072474,
 0.020977718663918954]  
Answered By: Caner Erden

In case you want to have a minimum threshold for the randomly chosen numbers (i.e., the generated numbers should be atleast min_thresh),

rand_prop = 1 - num_of_values * min_thresh
random_numbers = (np.random.dirichlet(np.ones(10),size=1)[0] * rand_prop) + min_thresh

Just make sure that you have num_of_values (number of values to be generated) such that it is possible for generating required numbers (num_values <= 1/min_thesh)

So basically, we are fixing some portion of 1 for minimum threshold, then we create random numbers in other portion. We add min_thesh to all numbers to get sum 1.
For e.g: lets say you want to generate 3 numbers, with min_thresh=0.2. We create a portion to fill by random numbers [1 – (0.2×3) = 0.4]. We fill that portion and add 0.2 to all values, so we can get 0.6 filled too.

This is standard scaling and shifting used in random numbers generation theory. Credit goes to my friend Jeel Vaishnav (I am not sure if has SO profile) and @sega_sai.

Answered By: Parthesh Soni

An alternative solution would be using random.choice and divide by sum:

import random 
n = 5
rand_num = [random.choice(range(0,100)) for r in range(n)] # create random integers
rand_num = [i/sum(rand_num) for i in rand_num] # normalize them
Answered By: Sam S

Inspired by @sega_sai answer with an up-to-date and recommanded numpy implementation [March 2022]

from numpy.random import default_rng

rng = default_rng()
rng.dirichlet(np.ones(10),size=1)
>>> array([[0.01279836, 0.16891858, 0.01136867, 0.17577222, 0.27944229,
        0.06244618, 0.19878224, 0.02481954, 0.01478089, 0.05087103]])

References:

Answered By: Antiez
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.