splitting list in chunks of balanced weight
Question:
I need an algorithm to split a list of values into such chunks, that sum of values in every chunk is (approximately) equals (its some variation of Knapsack problem, I suppose)
So, for example [1, 2, 1, 4, 10, 3, 8] => [[8, 2], [10], [1, 3, 1, 4]]
Chunks of equal lengths are preferred, but it’s not a constraint.
Python is preferred language, but others are welcome as well
Edit: number of chunks is defined
Answers:
Greedy:
1. Order the available items descending.
2. Create N empty groups
3. Start adding the items one at a time into the group that has the smallest sum in it.
I think in most real life situations this should be enough.
you may want to use Artificial Intelligence tools for the problem.
first define your problem
States={(c1,c2,...,ck) | c1,...,ck are subgroups of your problem , and union(c1,..,ck)=S }
successors((c1,...,ck)) = {switch one element from one sub list to another }
utility(c1,...,ck) = max{sum(c1),sum(c2)...} - min{sum(c1),sum(c2),...}
now, you can use steepest ascent hill climbing with random-restarts.
this algorithm will be anytime, meaning you can start searching, and when time’s up – stop it, and you will get the best result so far. the result will be better as run time increased.
Based on @Alin Purcaru answer and @amit remarks, I wrote code (Python 3.1). It has, as far as I tested, linear performance (both for number of items and number of chunks, so finally it’s O(N * M)). I avoid sorting the list every time, keeping current sum of values for every chunk in a dict (can be less practical with greater number of chunks)
import time, random
def split_chunks(l, n):
"""
Splits list l into n chunks with approximately equals sum of values
see http://stackoverflow.com/questions/6855394/splitting-list-in-chunks-of-balanced-weight
"""
result = [[] for i in range(n)]
sums = {i:0 for i in range(n)}
c = 0
for e in l:
for i in sums:
if c == sums[i]:
result[i].append(e)
break
sums[i] += e
c = min(sums.values())
return result
if __name__ == '__main__':
MIN_VALUE = 0
MAX_VALUE = 20000000
ITEMS = 50000
CHUNKS = 256
l =[random.randint(MIN_VALUE, MAX_VALUE ) for i in range(ITEMS)]
t = time.time()
r = split_chunks(l, CHUNKS)
print(ITEMS, CHUNKS, time.time() - t)
Just because, you know, we can, the same code in PHP 5.3 (2 – 3 times slower than Python 3.1):
function split_chunks($l, $n){
$result = array_fill(0, $n, array());
$sums = array_fill(0, $n, 0);
$c = 0;
foreach ($l as $e){
foreach ($sums as $i=>$sum){
if ($c == $sum){
$result[$i][] = $e;
break;
} // if
} // foreach
$sums[$i] += $e;
$c = min($sums);
} // foreach
return $result;
}
define('MIN_VALUE',0);
define('MAX_VALUE',20000000);
define('ITEMS',50000);
define('CHUNKS',128);
$l = array();
for ($i=0; $i<ITEMS; $i++){
$l[] = rand(MIN_VALUE, MAX_VALUE);
}
$t = microtime(true);
$r = split_chunks($l, CHUNKS);
$t = microtime(true) - $t;
print(ITEMS. ' ' . CHUNKS .' ' . $t . ' ');
This will be faster and a little cleaner (based on above ideas!)
def split_chunks2(l, n):
result = [[] for i in range(n)]
sums = [0]*n
i = 0
for e in l:
result[i].append(e)
sums[i] += e
i = sums.index(min(sums))
return result
Scala version of foxtrotmikew answer:
def workload_balancer(element_list: Seq[(Long, Any)], partitions: Int): Seq[Seq[(Long, Any)]] = {
val result = scala.collection.mutable.Seq.fill(partitions)(null : Seq[(Long, Any)])
val index = (0 to element_list.size-1)
val weights = scala.collection.mutable.Seq.fill(partitions)(0l)
(0 to partitions-1).foreach( x => weights(x) = 0 )
var i = 0
for (e <- element_list){
result(i) = if(result(i) == null) Seq(e) else result(i) ++: Seq(e)
weights(i) = weights(i) + e._1
i = weights.indexOf( weights.min )
}
result.toSeq
}
element_list should be (weight : Long, Object : Any), then you can order and split objects into different workloads (result). It help me a lot!, thnks.
I need an algorithm to split a list of values into such chunks, that sum of values in every chunk is (approximately) equals (its some variation of Knapsack problem, I suppose)
So, for example [1, 2, 1, 4, 10, 3, 8] => [[8, 2], [10], [1, 3, 1, 4]]
Chunks of equal lengths are preferred, but it’s not a constraint.
Python is preferred language, but others are welcome as well
Edit: number of chunks is defined
Greedy:
1. Order the available items descending.
2. Create N empty groups
3. Start adding the items one at a time into the group that has the smallest sum in it.
I think in most real life situations this should be enough.
you may want to use Artificial Intelligence tools for the problem.
first define your problem
States={(c1,c2,...,ck) | c1,...,ck are subgroups of your problem , and union(c1,..,ck)=S }
successors((c1,...,ck)) = {switch one element from one sub list to another }
utility(c1,...,ck) = max{sum(c1),sum(c2)...} - min{sum(c1),sum(c2),...}
now, you can use steepest ascent hill climbing with random-restarts.
this algorithm will be anytime, meaning you can start searching, and when time’s up – stop it, and you will get the best result so far. the result will be better as run time increased.
Based on @Alin Purcaru answer and @amit remarks, I wrote code (Python 3.1). It has, as far as I tested, linear performance (both for number of items and number of chunks, so finally it’s O(N * M)). I avoid sorting the list every time, keeping current sum of values for every chunk in a dict (can be less practical with greater number of chunks)
import time, random
def split_chunks(l, n):
"""
Splits list l into n chunks with approximately equals sum of values
see http://stackoverflow.com/questions/6855394/splitting-list-in-chunks-of-balanced-weight
"""
result = [[] for i in range(n)]
sums = {i:0 for i in range(n)}
c = 0
for e in l:
for i in sums:
if c == sums[i]:
result[i].append(e)
break
sums[i] += e
c = min(sums.values())
return result
if __name__ == '__main__':
MIN_VALUE = 0
MAX_VALUE = 20000000
ITEMS = 50000
CHUNKS = 256
l =[random.randint(MIN_VALUE, MAX_VALUE ) for i in range(ITEMS)]
t = time.time()
r = split_chunks(l, CHUNKS)
print(ITEMS, CHUNKS, time.time() - t)
Just because, you know, we can, the same code in PHP 5.3 (2 – 3 times slower than Python 3.1):
function split_chunks($l, $n){
$result = array_fill(0, $n, array());
$sums = array_fill(0, $n, 0);
$c = 0;
foreach ($l as $e){
foreach ($sums as $i=>$sum){
if ($c == $sum){
$result[$i][] = $e;
break;
} // if
} // foreach
$sums[$i] += $e;
$c = min($sums);
} // foreach
return $result;
}
define('MIN_VALUE',0);
define('MAX_VALUE',20000000);
define('ITEMS',50000);
define('CHUNKS',128);
$l = array();
for ($i=0; $i<ITEMS; $i++){
$l[] = rand(MIN_VALUE, MAX_VALUE);
}
$t = microtime(true);
$r = split_chunks($l, CHUNKS);
$t = microtime(true) - $t;
print(ITEMS. ' ' . CHUNKS .' ' . $t . ' ');
This will be faster and a little cleaner (based on above ideas!)
def split_chunks2(l, n):
result = [[] for i in range(n)]
sums = [0]*n
i = 0
for e in l:
result[i].append(e)
sums[i] += e
i = sums.index(min(sums))
return result
Scala version of foxtrotmikew answer:
def workload_balancer(element_list: Seq[(Long, Any)], partitions: Int): Seq[Seq[(Long, Any)]] = {
val result = scala.collection.mutable.Seq.fill(partitions)(null : Seq[(Long, Any)])
val index = (0 to element_list.size-1)
val weights = scala.collection.mutable.Seq.fill(partitions)(0l)
(0 to partitions-1).foreach( x => weights(x) = 0 )
var i = 0
for (e <- element_list){
result(i) = if(result(i) == null) Seq(e) else result(i) ++: Seq(e)
weights(i) = weights(i) + e._1
i = weights.indexOf( weights.min )
}
result.toSeq
}
element_list should be (weight : Long, Object : Any), then you can order and split objects into different workloads (result). It help me a lot!, thnks.