Distribution of elements according to percentage frequency

Question

Is there any function in pandas, numpy or python which can generate frequency distribution according to the percentage value, like we can do with EnumeratedDistribution in java.

Input:

values = [0, 1, 2]

percentage = [0.5, 0.30, 0.20]

total = 10

Output:

[0, 0, 0, 0, 0, 1, 1, 1, 2, 2]

out of total 10 elements, 50% consists of 0, 30% consists of 1 and 20% consists of 2

Asked By: Amitabh Kumar

||

Source

Answer 1

You can use numpy’s repeat() function to repeat values in values by a specified number of times (percentage * total):

import numpy as np


values = [0, 1, 2]

percentage = [0.5, 0.30, 0.20]

total = 11

repeats = np.around(np.array(percentage) * total).astype(np.int8)  # [6, 3, 2]

np.repeat(values, repeats)

Output:

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2])

I used np.around() function to round the repeats in case they are not whole numbers (e.g. if total is 11 then 11*0.5 -> 6, 11*0.3 -> 3 and 11*0.2 -> 2).

Answered By: Andreas K.

Answer 2

Without using numpy, but only list-comprehension:

values = [0, 1, 2]
percentage = [0.5, 0.30, 0.20]
total = 10

output = sum([[e]*int(total*p) for e,p in zip(values, percentage)], [])

Answered By: FBruzzesi

Answer 3

@Andreas K’s solution is great, but there still has problem regarding to its size of result not always equal to the origin total. E.g [27.3, 36.4, 27.3] = 91 after rounded would be [27, 36, 27] = 90

I prefer this better way of round, by editing a bit from this post https://stackoverflow.com/a/74044227/3789481

def round_retain_sum(x: np.array):
    x = x
    N = np.round(np.sum(x)).astype(int)
    y = x.astype(int)
    M = np.sum(y)
    K = N - M 
    z = y-x 
    if K!=0:
        idx = np.argpartition(z,K)[:K]
        y[idx] += 1     
    return y

import numpy as np

values = [0, 1, 2]
percentage = [0.5, 0.30, 0.20]
total = 11
repeats = round_retain_sum(np.array(percentage) * total)
np.repeat(values, repeats)

Answered By: Tấn Nguyên

Distribution of elements according to percentage frequency

Question:

Answers: