How to get all possible combinations of dividing a list of length n into m sublists
Question:
In my specific case I would like to get all possible combinations of dividing a list of length 20 into 4 sublists. The sublists may have any length from 1 to 17. As a result I would like to have a list of lists of sublists with all possible combinations.
List to start with:
list_init = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t]
List of lists of lists with all combinations:
list_combs = [
[[a], [b], [c], [d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t]],
[[a,d], [b], [c], [e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t]],
[[a,d,e], [b], [c], [f, g, h, i, j, k, l, m, n, o, p, q, r, s, t]],
.
.
.
]
Answers:
You can do that recursively by combining a first part of length 1,2,3,… with the partitions in 3 of the rest (recursing to 2 and 1):
from itertools import combinations
def partCombo(L,N=4):
if N==1: yield [L]; return
for size in range(1,len(L)-N+2):
for combo in combinations(range(len(L)),size): # index combinations
part = list(L[i] for i in combo) # first part
remaining = list(L)
for i in reversed(combo): del remaining[i] # unused items
yield from ([part]+rest for rest in partCombo(remaining,N-1))
output:
aList = list("abcdefg")
for part in partCombo(aList):
print(part)
[['a'], ['b'], ['c'], ['d', 'e', 'f', 'g']]
[['a'], ['b'], ['d'], ['c', 'e', 'f', 'g']]
[['a'], ['b'], ['e'], ['c', 'd', 'f', 'g']]
[['a'], ['b'], ['f'], ['c', 'd', 'e', 'g']]
[['a'], ['b'], ['g'], ['c', 'd', 'e', 'f']]
[['a'], ['b'], ['c', 'd'], ['e', 'f', 'g']]
[['a'], ['b'], ['c', 'e'], ['d', 'f', 'g']]
[['a'], ['b'], ['c', 'f'], ['d', 'e', 'g']]
[['a'], ['b'], ['c', 'g'], ['d', 'e', 'f']]
[['a'], ['b'], ['d', 'e'], ['c', 'f', 'g']]
[['a'], ['b'], ['d', 'f'], ['c', 'e', 'g']]
[['a'], ['b'], ['d', 'g'], ['c', 'e', 'f']]
[['a'], ['b'], ['e', 'f'], ['c', 'd', 'g']]
[['a'], ['b'], ['e', 'g'], ['c', 'd', 'f']]
[['a'], ['b'], ['f', 'g'], ['c', 'd', 'e']]
[['a'], ['b'], ['c', 'd', 'e'], ['f', 'g']]
... and many more ... (total 8400)
For a list of 20 items, there will be 1,085,570,781,624 combinations.
from math import factorial
def countCombos(L,N=4):
if N==1: return 1
result = 0
for size in range(1,L-N+2):
c = factorial(L)//factorial(size)//factorial(L-size)
c *= countCombos(L-size,N-1)
result += c
return result
sum(1 for _ in partCombo("abcdefg")) # 8400
sum(1 for _ in partCombo("abcdefghij")) # 818520
countCombos(7) # 8400
countCombos(10) # 818520
countCombos(15) # 1016542800
countCombos(20) # 1085570781624
With that many combinations (of 20 items), it will be impossible to output a list of of all the combinations (it would never fit in memory). This is why the function is written as a generator. Still, it would take a long time (weeks) to go through all the combinations produced by the generator function on a list of 20 items.
This is equivalent to deciding which list each element should go into (for example, by picking a number from 0 to 3, 20 times), building each list of lists, and rejecting the ones which contain empty lists.
So, we can write a generator like:
from itertools import product
def partitions_of(elements, list_count):
element_count = len(elements)
for list_ids in product(range(list_count), repeat=element_count):
if len(set(list_ids)) != list_count:
continue # this combination would have an empty list.
result = [[] for _ in range(list_count)]
for list_id, element in zip(list_ids, elements):
result[list_id].append(element)
yield result
Testing it:
>>> for p in partitions_of('abcd', 2):
... print(p)
...
[['a', 'b', 'c'], ['d']]
[['a', 'b', 'd'], ['c']]
[['a', 'b'], ['c', 'd']]
[['a', 'c', 'd'], ['b']]
[['a', 'c'], ['b', 'd']]
[['a', 'd'], ['b', 'c']]
[['a'], ['b', 'c', 'd']]
[['b', 'c', 'd'], ['a']]
[['b', 'c'], ['a', 'd']]
[['b', 'd'], ['a', 'c']]
[['b'], ['a', 'c', 'd']]
[['c', 'd'], ['a', 'b']]
[['c'], ['a', 'b', 'd']]
[['d'], ['a', 'b', 'c']]
As long as the number of input elements is large compared to the number of lists, relatively few candidates should get rejected. In the worst case, where there are as many lists as elements, such that we effectively permute the list and put each element in its own list, this will take O(N^N) time to compute O(N!) results, i.e., still exponentially slower (as implied by Stirling’s approximation) than it needs to be.
In my specific case I would like to get all possible combinations of dividing a list of length 20 into 4 sublists. The sublists may have any length from 1 to 17. As a result I would like to have a list of lists of sublists with all possible combinations.
List to start with:
list_init = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t]
List of lists of lists with all combinations:
list_combs = [
[[a], [b], [c], [d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t]],
[[a,d], [b], [c], [e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t]],
[[a,d,e], [b], [c], [f, g, h, i, j, k, l, m, n, o, p, q, r, s, t]],
.
.
.
]
You can do that recursively by combining a first part of length 1,2,3,… with the partitions in 3 of the rest (recursing to 2 and 1):
from itertools import combinations
def partCombo(L,N=4):
if N==1: yield [L]; return
for size in range(1,len(L)-N+2):
for combo in combinations(range(len(L)),size): # index combinations
part = list(L[i] for i in combo) # first part
remaining = list(L)
for i in reversed(combo): del remaining[i] # unused items
yield from ([part]+rest for rest in partCombo(remaining,N-1))
output:
aList = list("abcdefg")
for part in partCombo(aList):
print(part)
[['a'], ['b'], ['c'], ['d', 'e', 'f', 'g']]
[['a'], ['b'], ['d'], ['c', 'e', 'f', 'g']]
[['a'], ['b'], ['e'], ['c', 'd', 'f', 'g']]
[['a'], ['b'], ['f'], ['c', 'd', 'e', 'g']]
[['a'], ['b'], ['g'], ['c', 'd', 'e', 'f']]
[['a'], ['b'], ['c', 'd'], ['e', 'f', 'g']]
[['a'], ['b'], ['c', 'e'], ['d', 'f', 'g']]
[['a'], ['b'], ['c', 'f'], ['d', 'e', 'g']]
[['a'], ['b'], ['c', 'g'], ['d', 'e', 'f']]
[['a'], ['b'], ['d', 'e'], ['c', 'f', 'g']]
[['a'], ['b'], ['d', 'f'], ['c', 'e', 'g']]
[['a'], ['b'], ['d', 'g'], ['c', 'e', 'f']]
[['a'], ['b'], ['e', 'f'], ['c', 'd', 'g']]
[['a'], ['b'], ['e', 'g'], ['c', 'd', 'f']]
[['a'], ['b'], ['f', 'g'], ['c', 'd', 'e']]
[['a'], ['b'], ['c', 'd', 'e'], ['f', 'g']]
... and many more ... (total 8400)
For a list of 20 items, there will be 1,085,570,781,624 combinations.
from math import factorial
def countCombos(L,N=4):
if N==1: return 1
result = 0
for size in range(1,L-N+2):
c = factorial(L)//factorial(size)//factorial(L-size)
c *= countCombos(L-size,N-1)
result += c
return result
sum(1 for _ in partCombo("abcdefg")) # 8400
sum(1 for _ in partCombo("abcdefghij")) # 818520
countCombos(7) # 8400
countCombos(10) # 818520
countCombos(15) # 1016542800
countCombos(20) # 1085570781624
With that many combinations (of 20 items), it will be impossible to output a list of of all the combinations (it would never fit in memory). This is why the function is written as a generator. Still, it would take a long time (weeks) to go through all the combinations produced by the generator function on a list of 20 items.
This is equivalent to deciding which list each element should go into (for example, by picking a number from 0 to 3, 20 times), building each list of lists, and rejecting the ones which contain empty lists.
So, we can write a generator like:
from itertools import product
def partitions_of(elements, list_count):
element_count = len(elements)
for list_ids in product(range(list_count), repeat=element_count):
if len(set(list_ids)) != list_count:
continue # this combination would have an empty list.
result = [[] for _ in range(list_count)]
for list_id, element in zip(list_ids, elements):
result[list_id].append(element)
yield result
Testing it:
>>> for p in partitions_of('abcd', 2):
... print(p)
...
[['a', 'b', 'c'], ['d']]
[['a', 'b', 'd'], ['c']]
[['a', 'b'], ['c', 'd']]
[['a', 'c', 'd'], ['b']]
[['a', 'c'], ['b', 'd']]
[['a', 'd'], ['b', 'c']]
[['a'], ['b', 'c', 'd']]
[['b', 'c', 'd'], ['a']]
[['b', 'c'], ['a', 'd']]
[['b', 'd'], ['a', 'c']]
[['b'], ['a', 'c', 'd']]
[['c', 'd'], ['a', 'b']]
[['c'], ['a', 'b', 'd']]
[['d'], ['a', 'b', 'c']]
As long as the number of input elements is large compared to the number of lists, relatively few candidates should get rejected. In the worst case, where there are as many lists as elements, such that we effectively permute the list and put each element in its own list, this will take O(N^N) time to compute O(N!) results, i.e., still exponentially slower (as implied by Stirling’s approximation) than it needs to be.