Split whole number into float numbers

Question:

Goal: split 100 into 5 random 2 decimal place numbers.

So far, I can simulate any number of divisions.

However, these are only integers and are "balanced", in that they are the same or close in values to each other. So, the output is always the same.

Code:

def split(x, n):
 
    if(x < n):
        print(-1)
 
    elif (x % n == 0):
        for i in range(n):
            print(x//n, end =" ")
    else:
        zp = n - (x % n)
        pp = x//n
        for i in range(n):
            if(i>= zp):
                print(pp + 1, end =" ")
            else:
                print(pp, end =" ")
       
split(100, 5)
>>> 20 20 20 20 20 

Desired Output:

  • List of numbers,
  • Floating point numbers (2 dp),
  • Non-balanced.

Example Desired Output:

[10.50, 22.98, 13.23, 40.33, 12.96]
Asked By: DanielBell99

||

Answers:

You can use python’s builtin random library to specify random numbers, and keep subtracting that number from your total to generate more random numbers.

Like @Kenny Ostrom suggested, you can multiply by 100 and then divide by 100 at the end to get the desired number of decimal places. I made it more generic by adding a kwarg decimal, in case you need a different floating point precision

import random

def split(x, n, decimal=2):
    if x < n: return -1
    if decimal < 0: return -1

    x *= 10**decimal

    numbers = []
    left = x
    for i in range(n):
        number = random.randint(1,left)
        numbers.append(number)
        left -= number

    return [n / 10**decimal for n in numbers]

Some example outputs

split(100,5) # [1.8, 67.64, 0.51, 21.88, 3.67]
split(100,5) # [29.29, 16.39, 51.54, 1.91, 0.71]
split(100,5) # [95.24, 0.79, 0.82, 2.56, 0.02]
split(100,5) # [10.18, 45.09, 30.87, 0.3, 0.68]
Answered By: Gabriel d'Agosto

If you generate values uniformly starting with the full range for the first one, the remaining range for the second, the remaining remaining range for the third, etc., there’s a 50% chance each time of getting a value larger than half the remaining range. This leads to a systematic bias—on average, earlier values tend to be larger than later values. There’s a simple algorithm to avoid this and get identical distributions for all positions:

  1. Generate a list containing 0 and the maximum value, and then append n-1 values which are all uniformly distributed between 0 and the maximum;
  2. Sort the list; and
  3. Calculate the differences between adjacent pairs of values in the sorted list.

By definition the results will add up to the maximum, because they are the interval lengths between 0 and the maximum. Big gaps or small gaps are equally likely to fall between any of the pairs, guaranteeing identical distributions (to the extent that the PRNG is actually uniform).

I’ve adopted the idea of scaling up by 100 and back to get two decimals. Here it is in code:

import random
import sys

def split(x, n, decimal=2):
    assert (decimal >= 0), "Number of decimals must be positive"

    scale_factor = 10**decimal
    x *= scale_factor
    assert (n <= x), "Quantity exceeds range"

    numbers = [0, x]
    for _ in range(n - 1):
        numbers.append(random.randint(1, x))
    numbers.sort()

    return [(numbers[i] - numbers[i-1]) / scale_factor for i in range(1, n+1)]

which produces outcomes such as

[10.82, 12.97, 17.92, 39.46, 18.83]
[25.99, 21.35, 29.12, 8.13, 15.41]
[5.51, 4.28, 69.59, 9.62, 11.0]
[21.39, 20.96, 11.25, 15.07, 31.33]

If you want to ensure that there are no zeroes in the result, you need to ensure that there are no duplicates in the randomly generated set of values. This can be accomplished by sampling the range (0,x), exclusive of the endpoints, without replacement. The following implementation does this quite simply, and perhaps more readably than the loop-based implementation above:

import random
import sys

def split(x, n, decimal=2):
    assert (decimal >= 0), "Number of decimals must be positive"

    scale_factor = 10**decimal
    x *= scale_factor
    assert (n <= x), "Quantity exceeds range"

    numbers = [0, x] + random.sample(range(1, x), n - 1)
    numbers.sort()
    return [(numbers[i] - numbers[i-1]) / scale_factor for i in range(1, n+1)]

This version has no problem dealing with values of n which are large relative to x:

split(100, 20)    # [6.98,4.53,23.66,2.84,2.53,2.81,12.86,0.39,3.05,11.19,3.21,2.56,1.4,1.67,3.13,1.76,7.21,0.23,1.52,6.47]
split(1, 5, 1)    # [0.1,0.2,0.1,0.3,0.3]
split(1, 10, 1)   # [0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1]

FOLLOWUP

I tried generating 10,000 replications of split(100,5) for my solution and the currently accepted answer, so that I could show side-by-side histograms of the distributions of outcomes for the first through fifth values generated. The results for my solution are:

Side-by-side histograms showing identical distributions of generated results, independent of order

As you can see, the distribution of outcomes is independent of the order of generation.

I was unable to generate 10k replications using the code provided in the other answer, it fails within a few dozen iterations with the error message ValueError: empty range for randrange() (1, 1, 0). That approach uses up the range with high probability, so the algorithm runs out of room to finish generating all the values.

Answered By: pjs