Subdivide integers using a given ratio without producing floats
Question:
I need to subdivide a given amount of items (lets say 10) using a given ratio [0.55, 0.45]
. The result here should either be 6:4 or 5:5. The usual approach [0.55*10, 0.45*10]
would result in [6, 5]
(11, not 10).
Another example: divide 7 using ratio: [0.36, 0.44, 0.07, 0.07, 0.03, 0.03]
which ideally should yield something like [3, 3, 1, 0, 0, 0]
or [3, 3, 0, 1, 0, 0]
.
What would be a good approach to this problem?
Answers:
Here’s my try on the matter 🙂 The hardest part being reversing the sort operation and matching it with results… If you don’t need to keep the original order of ratios, then you can delete part of the last function.
def scale_ratio(ratios: list) -> list:
sum_ = sum(ratios)
return [x/sum_ for x in ratios]
def ratio_breakdown_recursive(x: int, ratios: list) -> list:
top_ratio = ratios[0]
part = round(x*top_ratio)
if x <= part:
return [x]
x -= part
return [part] + ratio_breakdown_recursive(x, scale_ratio(ratios[1:]))
def ratio_breakdown(x: int, ratios: list) -> list:
sorted_ratio = sorted(ratios, reverse=True)
assert(round(sum(ratios)) == 1)
sorted_result = ratio_breakdown_recursive(x, sorted_ratio)
assert(sum(sorted_result) == x)
# Now, we have to reverse the sorting and add missing zeros
sorted_result += [0]*(len(ratios)-len(sorted_result))
numbered_ratios = [(r, i) for i, r in enumerate(ratios)]
sorted_numbered_ratios = sorted(numbered_ratios, reverse=True)
combined = zip(sorted_numbered_ratios, sorted_result)
combined_unsorted = sorted(combined, key=lambda x: x[0][1])
unsorted_results = [x[1] for x in combined_unsorted]
return unsorted_results
Results:
ratio_breakdown(7, [0.36, 0.44, 0.07, 0.07, 0.03, 0.03])
[3, 3, 1, 0, 0, 0]
ratio_breakdown(10, [0.55, 0.45])
[6, 4]
ratio_breakdown(16, [0.16, 0.47, 0.13, 0.24])
[2, 8, 2, 4]
EDIT: That’s Python3.
I would suggest you to have another array. I’m beginner in Python.
Here is the code with your example (you can simply adapt it, I’m sure) :
a = [0.36, 0.44, 0.07, 0.07, 0.03, 0.03]
numberItem = 7
remainder = numberItem
b = [0,0,0,0,0,0]
for i in range(0,6):
b[i] = round(a[i]*numberItem)
if (b[i] > remainder) or (b[i] == 0):
b[i] = remainder
remainder = 0
else:
remainder = remainder - b[i]
print(b[i])
With this bounds, you cannot have more items than stated.
It’s better if the ratio array is sorted from the bigger to the smaller.
Here is a non-recursive NumPy implementation of the @maciek‘s algorithm:
import numpy as np
def split_integer_into_parts(x: int, ratios: list) -> np.ndarray:
ratios = np.array(ratios, dtype=float)
assert x >= 0
assert (ratios >= 0).all()
assert ratios.sum() > 0
# sort ratios
sort_idx = np.argsort(-ratios)
ratios = ratios[sort_idx]
# compute fractions of the remainders
ratios_cumsum = np.cumsum(ratios[::-1])[::-1]
fracs = np.divide(ratios, ratios_cumsum, out=np.ones_like(ratios), where=(ratios_cumsum != 0))
# split integer into parts
remainder = x
parts = np.zeros_like(fracs, dtype=int)
for i, frac in enumerate(fracs):
parts[i] = round(remainder * frac)
remainder -= parts[i]
assert parts.sum() == x
# unsort parts
parts = parts[np.argsort(sort_idx)]
return parts
Accepted Answer written in R
I was looking for this question/solution, but I’m working in R, so I have re-written @Maciek’s answer above in R for anyone that passes this way. I kept it as close as I could to the original answer in Python.
library(dplyr)
scale_ratio <- function(ratios){
sum_ <- sum(ratios)
return(ratios/sum_)
}
ratio_breakdown_recursive <- function(x, ratios){
top_ratio <- ratios[1]
part <- round(x*top_ratio)
if (x <= part){
return(x)
}
x <- (x - part)
c(part, ratio_breakdown_recursive(x, scale_ratio(ratios[2:length(ratios)])))
}
ratio_breakdown <- function(x, ratios){
x <- x[1]
sorted_ratio = sort(ratios, decreasing = TRUE)
stopifnot(round(sum(ratios)) == 1)
sorted_result = ratio_breakdown_recursive(x, sorted_ratio)
stopifnot(sum(sorted_result) == x)
# Now, we have to reverse the sorting and add missing zeros
sorted_result <- append(sorted_result, rep(0, length(ratios) - length(sorted_result)))
numbered_ratios <- data.frame(ratios, seq_along(ratios))
sorted_numbered_ratios <- arrange(numbered_ratios, desc(ratios))
combined <- cbind(sorted_numbered_ratios, sorted_result)
combined_unsorted <- arrange(combined, seq_along.ratios.)
unsorted_results <- combined_unsorted[,3]
return(unsorted_results)
}
> ratio_breakdown(7, c(0.36, 0.44,0.07,0.07,0.03,0.03))
[1] 3 3 0 1 0 0
> ratio_breakdown(10, c(0.55,0.45))
[1] 6 4
> ratio_breakdown(16, c(0.16,0.47,0.13,0.24))
[1] 2 8 2 4
I realize the first answer returns a different order than @Maciek’s answer, but for my purposes, this worked, and if someone comes along and can improve my R code, be my guest — I’m happy to learn.
I would like to suggest another solution. The idea is to first calculate the approx integer subdivision by taking the floor of the float subdivision. The difference between the desired sum and the sum of the approx tells us how many values we need to increment to get the correct result. We sort the values by taking the difference between the floats and the ints. The higher the difference is the closest the value to it’s ceil. Finally, we increment as many of the closest values to their ceil as we need to get the correct sum.
Note: I implemented using NumPy, but converting to standard Python should be straight forward.
def split_integer_into_parts(val: int, ratio: np.ndarray) -> np.ndarray:
ratio = np.asarray(ratio)
assert val >= 0
assert np.all(ratio >= 0)
assert np.round(np.sum(ratio), 5) == 1
# get float and approx int subdivision
result_f = ratio * val
result_i = np.floor(result_f)
# get difference between float and int
diff_floor = result_f - result_i
result_i = result_i.astype(int) # convert to int
# difference from approx (#values to increment)
diff_val = val - np.sum(result_i)
if diff_val == 0:
return result_i
# reverse sort to get highest differences (closest to ceil)
idx = np.argsort(diff_floor)[::-1]
# increment as many of the closest as diff from approx
result_i[idx[:diff_val]] += 1
assert np.sum(result_i) == val
return result_i
Results:
split_integer_into_parts(6, [0.4, 0.4, 0.2])
[2 3 1]
split_integer_into_parts(10, [0.45, 0.55])
[4 6]
split_integer_into_parts(16, [0.47, 0.24, 0.16, 0.13])
[7 4 3 2]
split_integer_into_parts(7, [0.36, 0.44, 0.07, 0.07, 0.03, 0.03])
[3 3 0 1 0 0]
I need to subdivide a given amount of items (lets say 10) using a given ratio [0.55, 0.45]
. The result here should either be 6:4 or 5:5. The usual approach [0.55*10, 0.45*10]
would result in [6, 5]
(11, not 10).
Another example: divide 7 using ratio: [0.36, 0.44, 0.07, 0.07, 0.03, 0.03]
which ideally should yield something like [3, 3, 1, 0, 0, 0]
or [3, 3, 0, 1, 0, 0]
.
What would be a good approach to this problem?
Here’s my try on the matter 🙂 The hardest part being reversing the sort operation and matching it with results… If you don’t need to keep the original order of ratios, then you can delete part of the last function.
def scale_ratio(ratios: list) -> list:
sum_ = sum(ratios)
return [x/sum_ for x in ratios]
def ratio_breakdown_recursive(x: int, ratios: list) -> list:
top_ratio = ratios[0]
part = round(x*top_ratio)
if x <= part:
return [x]
x -= part
return [part] + ratio_breakdown_recursive(x, scale_ratio(ratios[1:]))
def ratio_breakdown(x: int, ratios: list) -> list:
sorted_ratio = sorted(ratios, reverse=True)
assert(round(sum(ratios)) == 1)
sorted_result = ratio_breakdown_recursive(x, sorted_ratio)
assert(sum(sorted_result) == x)
# Now, we have to reverse the sorting and add missing zeros
sorted_result += [0]*(len(ratios)-len(sorted_result))
numbered_ratios = [(r, i) for i, r in enumerate(ratios)]
sorted_numbered_ratios = sorted(numbered_ratios, reverse=True)
combined = zip(sorted_numbered_ratios, sorted_result)
combined_unsorted = sorted(combined, key=lambda x: x[0][1])
unsorted_results = [x[1] for x in combined_unsorted]
return unsorted_results
Results:
ratio_breakdown(7, [0.36, 0.44, 0.07, 0.07, 0.03, 0.03])
[3, 3, 1, 0, 0, 0]
ratio_breakdown(10, [0.55, 0.45])
[6, 4]
ratio_breakdown(16, [0.16, 0.47, 0.13, 0.24])
[2, 8, 2, 4]
EDIT: That’s Python3.
I would suggest you to have another array. I’m beginner in Python.
Here is the code with your example (you can simply adapt it, I’m sure) :
a = [0.36, 0.44, 0.07, 0.07, 0.03, 0.03]
numberItem = 7
remainder = numberItem
b = [0,0,0,0,0,0]
for i in range(0,6):
b[i] = round(a[i]*numberItem)
if (b[i] > remainder) or (b[i] == 0):
b[i] = remainder
remainder = 0
else:
remainder = remainder - b[i]
print(b[i])
With this bounds, you cannot have more items than stated.
It’s better if the ratio array is sorted from the bigger to the smaller.
Here is a non-recursive NumPy implementation of the @maciek‘s algorithm:
import numpy as np
def split_integer_into_parts(x: int, ratios: list) -> np.ndarray:
ratios = np.array(ratios, dtype=float)
assert x >= 0
assert (ratios >= 0).all()
assert ratios.sum() > 0
# sort ratios
sort_idx = np.argsort(-ratios)
ratios = ratios[sort_idx]
# compute fractions of the remainders
ratios_cumsum = np.cumsum(ratios[::-1])[::-1]
fracs = np.divide(ratios, ratios_cumsum, out=np.ones_like(ratios), where=(ratios_cumsum != 0))
# split integer into parts
remainder = x
parts = np.zeros_like(fracs, dtype=int)
for i, frac in enumerate(fracs):
parts[i] = round(remainder * frac)
remainder -= parts[i]
assert parts.sum() == x
# unsort parts
parts = parts[np.argsort(sort_idx)]
return parts
Accepted Answer written in R
I was looking for this question/solution, but I’m working in R, so I have re-written @Maciek’s answer above in R for anyone that passes this way. I kept it as close as I could to the original answer in Python.
library(dplyr)
scale_ratio <- function(ratios){
sum_ <- sum(ratios)
return(ratios/sum_)
}
ratio_breakdown_recursive <- function(x, ratios){
top_ratio <- ratios[1]
part <- round(x*top_ratio)
if (x <= part){
return(x)
}
x <- (x - part)
c(part, ratio_breakdown_recursive(x, scale_ratio(ratios[2:length(ratios)])))
}
ratio_breakdown <- function(x, ratios){
x <- x[1]
sorted_ratio = sort(ratios, decreasing = TRUE)
stopifnot(round(sum(ratios)) == 1)
sorted_result = ratio_breakdown_recursive(x, sorted_ratio)
stopifnot(sum(sorted_result) == x)
# Now, we have to reverse the sorting and add missing zeros
sorted_result <- append(sorted_result, rep(0, length(ratios) - length(sorted_result)))
numbered_ratios <- data.frame(ratios, seq_along(ratios))
sorted_numbered_ratios <- arrange(numbered_ratios, desc(ratios))
combined <- cbind(sorted_numbered_ratios, sorted_result)
combined_unsorted <- arrange(combined, seq_along.ratios.)
unsorted_results <- combined_unsorted[,3]
return(unsorted_results)
}
> ratio_breakdown(7, c(0.36, 0.44,0.07,0.07,0.03,0.03))
[1] 3 3 0 1 0 0
> ratio_breakdown(10, c(0.55,0.45))
[1] 6 4
> ratio_breakdown(16, c(0.16,0.47,0.13,0.24))
[1] 2 8 2 4
I realize the first answer returns a different order than @Maciek’s answer, but for my purposes, this worked, and if someone comes along and can improve my R code, be my guest — I’m happy to learn.
I would like to suggest another solution. The idea is to first calculate the approx integer subdivision by taking the floor of the float subdivision. The difference between the desired sum and the sum of the approx tells us how many values we need to increment to get the correct result. We sort the values by taking the difference between the floats and the ints. The higher the difference is the closest the value to it’s ceil. Finally, we increment as many of the closest values to their ceil as we need to get the correct sum.
Note: I implemented using NumPy, but converting to standard Python should be straight forward.
def split_integer_into_parts(val: int, ratio: np.ndarray) -> np.ndarray:
ratio = np.asarray(ratio)
assert val >= 0
assert np.all(ratio >= 0)
assert np.round(np.sum(ratio), 5) == 1
# get float and approx int subdivision
result_f = ratio * val
result_i = np.floor(result_f)
# get difference between float and int
diff_floor = result_f - result_i
result_i = result_i.astype(int) # convert to int
# difference from approx (#values to increment)
diff_val = val - np.sum(result_i)
if diff_val == 0:
return result_i
# reverse sort to get highest differences (closest to ceil)
idx = np.argsort(diff_floor)[::-1]
# increment as many of the closest as diff from approx
result_i[idx[:diff_val]] += 1
assert np.sum(result_i) == val
return result_i
Results:
split_integer_into_parts(6, [0.4, 0.4, 0.2])
[2 3 1]
split_integer_into_parts(10, [0.45, 0.55])
[4 6]
split_integer_into_parts(16, [0.47, 0.24, 0.16, 0.13])
[7 4 3 2]
split_integer_into_parts(7, [0.36, 0.44, 0.07, 0.07, 0.03, 0.03])
[3 3 0 1 0 0]