Iteration from an arbitrary state of multiple iterators, with: i1 < i2 < i3 … < in
Question:
I’m looping with multiple iterators, each of which can take values between 0 and 193, and each iterator must also be greater than the previous one (i1 < i2 < i3 … < in).
For this example, I have 5 iterators; the total number of states to be iterated over is very large (2,174,032,288). Therefore I am processing these states in batches and saving the start and final ‘states’ from each batch, so I can continue from where it left off.
My following code works. My question is what is the best way of generalising these nested if statements so that it works for any number of ordered iterators, not just 5. Also, is there a better approach to achieve this?
rnIndex = [0, 1, 2, 3, 4] # the starting state of indices to iterate from
batchSize = 1000000 # iterate through 1 million index states per batch
batchNumber = 0 # the batch number to start from - 1
batchNumberMax = 10 # run up to and including this batch number
rnLimit = 194 # iterate up to (but not including) for each index
rnComplete = False
while not rnComplete and batchNumber < batchNumberMax:
batchNumber += 1
print('nStart index (included): ' + str(rnIndex))
rnBatch = []
for i in range(batchSize):
if i == batchSize - 1:
print('Final index (included): ' + str(rnIndex))
rnBatch.append(rnIndex) # add each rnIndex to rnBatch
rnIndex[-1] += 1
if rnIndex[-1] == rnLimit:
rnIndex[-2] += 1
rnIndex[-1] = rnIndex[-2] + 1
if rnIndex[-2] == rnLimit - 1:
rnIndex[-3] += 1
rnIndex[-2] = rnIndex[-3] + 1
rnIndex[-1] = rnIndex[-2] + 1
if rnIndex[-3] == rnLimit - 2:
rnIndex[-4] += 1
rnIndex[-3] = rnIndex[-4] + 1
rnIndex[-2] = rnIndex[-3] + 1
rnIndex[-1] = rnIndex[-2] + 1
if rnIndex[-4] == rnLimit - 3:
rnIndex[-5] += 1
rnIndex[-4] = rnIndex[-5] + 1
rnIndex[-3] = rnIndex[-4] + 1
rnIndex[-2] = rnIndex[-3] + 1
rnIndex[-1] = rnIndex[-2] + 1
if rnIndex[-5] == rnLimit - 4:
rnComplete = True
break
print('len(rnBatch) = '+str(len(rnBatch))) # check the length of rnBatch
print(rnIndex) # the rnIndex state to resume from
out:
Start index (included): [0, 1, 2, 3, 4]
Final index (included): [0, 1, 94, 99, 133]
len(rnBatch) = 1000000
...
Start index (included): [0, 8, 24, 122, 173]
Final index (included): [0, 9, 23, 54, 90]
len(rnBatch) = 1000000
Start index (included): [0, 9, 23, 54, 91]
Final index (included): [0, 10, 22, 182, 188]
len(rnBatch) = 1000000
[0, 10, 22, 182, 189]
Process finished with exit code 0
Answers:
You can use a generator function that yields all possible combination of indices with the constraints you provided that is
each of which can take values between 0 and 193, and each iterator must also be greater than the previous one (i1 < i2 < i3 … < in).
here is what it looks like:
from itertools import combinations
def generate_indices(limit, n):
# generate all combinations of n numbers
for combo in combinations(range(limit), n):
# check if each number is greater than the previous one
if all(x < y for x, y in zip(combo, combo[1:])):
yield list(combo)
You can use this function in your loop instead of the nested if statements, like this:
rnLimit = 194
n = 5
batchSize = 1000000
batchNumber = 0
batchNumberMax = 20
index_generator = generate_indices(rnLimit, n)
while batchNumber < batchNumberMax:
batchNumber += 1
print('nStart index (included): ' + str(next(index_generator)))
rnBatch = []
for i in range(batchSize):
try:
rnIndex = next(index_generator)
if i == batchSize - 1:
print('Final index (included): ' + str(rnIndex))
rnBatch.append(rnIndex)
except StopIteration:
break
print('len(rnBatch) = '+str(len(rnBatch)))
This code should produce the same output as your original code, but it works for any number of ordered iterators.
You can write a function that produces the next sequence of indexes from a previous one, and use it to advance through the combinations from any starting point:
def nextSeq(maxVal,values):
for i,v in enumerate(reversed(values),1):
if v <= maxVal-i:
return values[:-i]+[values[-i]+k+1 for k in range(i)]
output:
seq = [0,1,2,3,4]
for _ in range(10):
print(seq)
seq = nextSeq(193,seq)
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 5]
[0, 1, 2, 3, 6]
[0, 1, 2, 3, 7]
[0, 1, 2, 3, 8]
[0, 1, 2, 3, 9]
[0, 1, 2, 3, 10]
[0, 1, 2, 3, 11]
[0, 1, 2, 3, 12]
[0, 1, 2, 3, 13]
The function could also be used to create a generator that can be used in a for-loop (without nesting):
def genSeq(maxVal,start):
seq = list(start)
while seq:
yield seq
seq = nextSeq(maxVal,seq)
output:
start = [188,189,190,191,192]
for seq in genSeq(193,start):
print(seq)
[188, 189, 190, 191, 192]
[188, 189, 190, 191, 193]
[188, 189, 190, 192, 193]
[188, 189, 191, 192, 193]
[188, 190, 191, 192, 193]
[189, 190, 191, 192, 193]
If you want to jump directly to a specific sequence (Nth sequence), a recursive function can convert an index to a sequence in the same order:
from math import factorial as fact
def seqAtIndex(index,maxVal,size):
if size == 1: return [index]
value = base = chunk = 0
while base+chunk <= index:
base += chunk
value += 1
chunk = fact(maxVal+1-value)//fact(size-1)//fact(maxVal+2-value-size)
return [value-1]
+ [value+s for s in seqAtIndex(index-base,maxVal-value,size-1)]
output:
for i in range(10):
print(i,seqAtIndex(i,193,5))
0 [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 5]
2 [0, 1, 2, 3, 6]
3 [0, 1, 2, 3, 7]
4 [0, 1, 2, 3, 8]
5 [0, 1, 2, 3, 9]
6 [0, 1, 2, 3, 10]
7 [0, 1, 2, 3, 11]
8 [0, 1, 2, 3, 12]
9 [0, 1, 2, 3, 13]
for i in range(2174032280,2174032288):
print(i,seqAtIndex(i,193,5))
2174032280 [187, 189, 191, 192, 193]
2174032281 [187, 190, 191, 192, 193]
2174032282 [188, 189, 190, 191, 192]
2174032283 [188, 189, 190, 191, 193]
2174032284 [188, 189, 190, 192, 193]
2174032285 [188, 189, 191, 192, 193]
2174032286 [188, 190, 191, 192, 193]
2174032287 [189, 190, 191, 192, 193]
Note that seqAtIndex
is much slower than nextSeq
or genSeq
so you should only use it to find the starting sequence and then use the other functions to advance sequentially
from itertools import product, combinations, islice
batchSize = 3
rnLimit = 7
combs = combinations(range(rnLimit), 5)
while batch := list(islice(combs, 3)):
print(batch)
Output showing the batches, your extra information could be added easily if actually necessary (Attempt This Online!):
[(0, 1, 2, 3, 4), (0, 1, 2, 3, 5), (0, 1, 2, 3, 6)]
[(0, 1, 2, 4, 5), (0, 1, 2, 4, 6), (0, 1, 2, 5, 6)]
[(0, 1, 3, 4, 5), (0, 1, 3, 4, 6), (0, 1, 3, 5, 6)]
[(0, 1, 4, 5, 6), (0, 2, 3, 4, 5), (0, 2, 3, 4, 6)]
[(0, 2, 3, 5, 6), (0, 2, 4, 5, 6), (0, 3, 4, 5, 6)]
[(1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 5, 6)]
[(1, 2, 4, 5, 6), (1, 3, 4, 5, 6), (2, 3, 4, 5, 6)]
I’m looping with multiple iterators, each of which can take values between 0 and 193, and each iterator must also be greater than the previous one (i1 < i2 < i3 … < in).
For this example, I have 5 iterators; the total number of states to be iterated over is very large (2,174,032,288). Therefore I am processing these states in batches and saving the start and final ‘states’ from each batch, so I can continue from where it left off.
My following code works. My question is what is the best way of generalising these nested if statements so that it works for any number of ordered iterators, not just 5. Also, is there a better approach to achieve this?
rnIndex = [0, 1, 2, 3, 4] # the starting state of indices to iterate from
batchSize = 1000000 # iterate through 1 million index states per batch
batchNumber = 0 # the batch number to start from - 1
batchNumberMax = 10 # run up to and including this batch number
rnLimit = 194 # iterate up to (but not including) for each index
rnComplete = False
while not rnComplete and batchNumber < batchNumberMax:
batchNumber += 1
print('nStart index (included): ' + str(rnIndex))
rnBatch = []
for i in range(batchSize):
if i == batchSize - 1:
print('Final index (included): ' + str(rnIndex))
rnBatch.append(rnIndex) # add each rnIndex to rnBatch
rnIndex[-1] += 1
if rnIndex[-1] == rnLimit:
rnIndex[-2] += 1
rnIndex[-1] = rnIndex[-2] + 1
if rnIndex[-2] == rnLimit - 1:
rnIndex[-3] += 1
rnIndex[-2] = rnIndex[-3] + 1
rnIndex[-1] = rnIndex[-2] + 1
if rnIndex[-3] == rnLimit - 2:
rnIndex[-4] += 1
rnIndex[-3] = rnIndex[-4] + 1
rnIndex[-2] = rnIndex[-3] + 1
rnIndex[-1] = rnIndex[-2] + 1
if rnIndex[-4] == rnLimit - 3:
rnIndex[-5] += 1
rnIndex[-4] = rnIndex[-5] + 1
rnIndex[-3] = rnIndex[-4] + 1
rnIndex[-2] = rnIndex[-3] + 1
rnIndex[-1] = rnIndex[-2] + 1
if rnIndex[-5] == rnLimit - 4:
rnComplete = True
break
print('len(rnBatch) = '+str(len(rnBatch))) # check the length of rnBatch
print(rnIndex) # the rnIndex state to resume from
out:
Start index (included): [0, 1, 2, 3, 4]
Final index (included): [0, 1, 94, 99, 133]
len(rnBatch) = 1000000
...
Start index (included): [0, 8, 24, 122, 173]
Final index (included): [0, 9, 23, 54, 90]
len(rnBatch) = 1000000
Start index (included): [0, 9, 23, 54, 91]
Final index (included): [0, 10, 22, 182, 188]
len(rnBatch) = 1000000
[0, 10, 22, 182, 189]
Process finished with exit code 0
You can use a generator function that yields all possible combination of indices with the constraints you provided that is
each of which can take values between 0 and 193, and each iterator must also be greater than the previous one (i1 < i2 < i3 … < in).
here is what it looks like:
from itertools import combinations
def generate_indices(limit, n):
# generate all combinations of n numbers
for combo in combinations(range(limit), n):
# check if each number is greater than the previous one
if all(x < y for x, y in zip(combo, combo[1:])):
yield list(combo)
You can use this function in your loop instead of the nested if statements, like this:
rnLimit = 194
n = 5
batchSize = 1000000
batchNumber = 0
batchNumberMax = 20
index_generator = generate_indices(rnLimit, n)
while batchNumber < batchNumberMax:
batchNumber += 1
print('nStart index (included): ' + str(next(index_generator)))
rnBatch = []
for i in range(batchSize):
try:
rnIndex = next(index_generator)
if i == batchSize - 1:
print('Final index (included): ' + str(rnIndex))
rnBatch.append(rnIndex)
except StopIteration:
break
print('len(rnBatch) = '+str(len(rnBatch)))
This code should produce the same output as your original code, but it works for any number of ordered iterators.
You can write a function that produces the next sequence of indexes from a previous one, and use it to advance through the combinations from any starting point:
def nextSeq(maxVal,values):
for i,v in enumerate(reversed(values),1):
if v <= maxVal-i:
return values[:-i]+[values[-i]+k+1 for k in range(i)]
output:
seq = [0,1,2,3,4]
for _ in range(10):
print(seq)
seq = nextSeq(193,seq)
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 5]
[0, 1, 2, 3, 6]
[0, 1, 2, 3, 7]
[0, 1, 2, 3, 8]
[0, 1, 2, 3, 9]
[0, 1, 2, 3, 10]
[0, 1, 2, 3, 11]
[0, 1, 2, 3, 12]
[0, 1, 2, 3, 13]
The function could also be used to create a generator that can be used in a for-loop (without nesting):
def genSeq(maxVal,start):
seq = list(start)
while seq:
yield seq
seq = nextSeq(maxVal,seq)
output:
start = [188,189,190,191,192]
for seq in genSeq(193,start):
print(seq)
[188, 189, 190, 191, 192]
[188, 189, 190, 191, 193]
[188, 189, 190, 192, 193]
[188, 189, 191, 192, 193]
[188, 190, 191, 192, 193]
[189, 190, 191, 192, 193]
If you want to jump directly to a specific sequence (Nth sequence), a recursive function can convert an index to a sequence in the same order:
from math import factorial as fact
def seqAtIndex(index,maxVal,size):
if size == 1: return [index]
value = base = chunk = 0
while base+chunk <= index:
base += chunk
value += 1
chunk = fact(maxVal+1-value)//fact(size-1)//fact(maxVal+2-value-size)
return [value-1]
+ [value+s for s in seqAtIndex(index-base,maxVal-value,size-1)]
output:
for i in range(10):
print(i,seqAtIndex(i,193,5))
0 [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 5]
2 [0, 1, 2, 3, 6]
3 [0, 1, 2, 3, 7]
4 [0, 1, 2, 3, 8]
5 [0, 1, 2, 3, 9]
6 [0, 1, 2, 3, 10]
7 [0, 1, 2, 3, 11]
8 [0, 1, 2, 3, 12]
9 [0, 1, 2, 3, 13]
for i in range(2174032280,2174032288):
print(i,seqAtIndex(i,193,5))
2174032280 [187, 189, 191, 192, 193]
2174032281 [187, 190, 191, 192, 193]
2174032282 [188, 189, 190, 191, 192]
2174032283 [188, 189, 190, 191, 193]
2174032284 [188, 189, 190, 192, 193]
2174032285 [188, 189, 191, 192, 193]
2174032286 [188, 190, 191, 192, 193]
2174032287 [189, 190, 191, 192, 193]
Note that seqAtIndex
is much slower than nextSeq
or genSeq
so you should only use it to find the starting sequence and then use the other functions to advance sequentially
from itertools import product, combinations, islice
batchSize = 3
rnLimit = 7
combs = combinations(range(rnLimit), 5)
while batch := list(islice(combs, 3)):
print(batch)
Output showing the batches, your extra information could be added easily if actually necessary (Attempt This Online!):
[(0, 1, 2, 3, 4), (0, 1, 2, 3, 5), (0, 1, 2, 3, 6)]
[(0, 1, 2, 4, 5), (0, 1, 2, 4, 6), (0, 1, 2, 5, 6)]
[(0, 1, 3, 4, 5), (0, 1, 3, 4, 6), (0, 1, 3, 5, 6)]
[(0, 1, 4, 5, 6), (0, 2, 3, 4, 5), (0, 2, 3, 4, 6)]
[(0, 2, 3, 5, 6), (0, 2, 4, 5, 6), (0, 3, 4, 5, 6)]
[(1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 5, 6)]
[(1, 2, 4, 5, 6), (1, 3, 4, 5, 6), (2, 3, 4, 5, 6)]