Lazily transpose dimensions 0 and 2 in iterator model
Question:
Given an iterable of an iterable of an iterable it_it_it
(i.e. a lazy representation of 3d array) you can lazily transpose dimensions 0
and 1
by zip(*it_it_it)
and dimensions 1
and 2
by map(lambda it_it: zip(*it_it), it_it_it)
.
However, the last combination (0
and 2
) is trickier. It seems you must full evaluate the outer two iterators before yielding anything and the type yielded must be List[List]
not a lazy Iterable[Iterable]
; the inner most iterator is the only one that can be lazily evaluated (i.e. Iterable[List[List]]
is the best you can do).
I’m going to give an answer but am interested in a more elegant answer.
Aside:
I’m interested in this question for understanding the problem with statically typed iterators i.e. rust and c++. Do you make sure to set up your data so you never have to do this operation. Is the best thing to do is just fully evaluate the iterators to List[List[List]]
and then transpose c style.
Answers:
import itertools
def transpose_(it_it_it):
blocks = [[iter(it) for it in it_it] for it_it in it_it_it]
it = iter(itertools.cycle(itertools.chain.from_iterable(zip(*blocks))))
while True:
try:
yield [[next(next(it)) for _ in range(len(blocks))]
for _ in range(len(blocks[0]))]
except StopIteration:
break
Here’s the code to test it
You can’t really avoid the materialization you’re trying to avoid.
Consider iterating over the transposed result you want:
for iter1 in transpose_layers_0_and_2(iterator):
for iter2 in iter1:
for iter3 in iter2:
pass
At the end of the first iteration of the outer loop, you’ve accessed every element of every sub-sub-iterator of the first sub-iterator of transpose_layers_0_and_2(iterator)
.
In the original pre-transpose iterator, these elements come from the first element of every sub-sub-iterator of the original iterator
.
That means that by the time the first iteration of the outer loop is complete, you must have materialized the entire first two layers of the original iterator to produce every sub-sub-iterator. There’s no way around it. Plus, you’ve only used one element of each sub-sub-iterator, so you still have to retain all those sub-sub-iterators in memory to produce the remaining elements. You can’t discard them yet.
Solution
def transpose_(it_it_it):
return zip(*map(zip, *it_it_it))
Derivation
My first version was this, and then I just minified it. I first swap the outer two dimensions, then the inner two, then the outer two again. Using your ways of doing that. Note the variable names, they reflect the dimensions / swaps:
def transpose_(xyz):
yxz = zip(*xyz)
yzx = map(lambda xz: zip(*xz), yxz)
zyx = zip(*yzx)
return zyx
Iteration / exhaustion visualization
For visualization, I turned the lists of the 3D input into iterators that report when they’ve been exhausted, and iterate the transposed data, printing its structure and values. We can see the outer two dimensions get exhausted up-front, and the innermost dimension is exhausted only at the end:
X-iterator exhausted
Y-iterator exhausted
Y-iterator exhausted
Y-iterator exhausted
[
(
0
8
16
)
(
4
12
20
)
]
[
(
1
9
17
)
(
5
13
21
)
]
[
(
2
10
18
)
(
6
14
22
)
]
[
(
3
11
19
)
(
7
15
23
)
]
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Code:
import numpy
def transpose_(it_it_it):
return zip(*map(zip, *it_it_it))
# Versions of zip and map that exhaust all inputs
_zip = zip
def zip(*its):
yield from _zip(*its)
for it in its[1:]:
next(it, None)
_map = map
def map(f, *its):
yield from _map(f, *its)
for it in its[1:]:
next(it, None)
# Iterators that report when they've been exhausted
def X(it):
yield from map(Y, it)
print('X-iterator exhausted')
def Y(it):
yield from map(Z, it)
print('Y-iterator exhausted')
def Z(it):
yield from it
print('Z-iterator exhausted')
# Test data
a = numpy.arange(3*2*4).reshape((3, 2, 4))
b = X(a.tolist())
# Iterate the transposed data
zyx = transpose_(b)
for yx in zyx:
print('[')
for x in yx:
print(' (')
for value in x:
print(' ', value)
print(' )')
print(']')
Benchmark
Time and memory of creating and iterating the transposed data, given data of size 100×100×100 (three attempts):
317 ms 845540 bytes transpose_Tom
117 ms 825400 bytes transpose_Kelly
351 ms 840144 bytes transpose_Tom
127 ms 824984 bytes transpose_Kelly
324 ms 844120 bytes transpose_Tom
116 ms 824984 bytes transpose_Kelly
Code:
import numpy
import tracemalloc as tm
from timeit import repeat
import itertools
def transpose_Tom(it_it_it):
blocks = [[iter(it) for it in it_it] for it_it in it_it_it]
it = iter(itertools.cycle(itertools.chain.from_iterable(zip(*blocks))))
while True:
try:
yield [[next(next(it)) for _ in range(len(blocks))]
for _ in range(len(blocks[0]))]
except StopIteration:
break
def transpose_Kelly(it_it_it):
return zip(*map(zip, *it_it_it))
def iterate(iii):
for ii in iii:
for i in ii:
for _ in i:
pass
n = 100
a = numpy.arange(n**3).reshape((n, n, n))
b = a.tolist()
for _ in range(3):
for func in transpose_Tom, transpose_Kelly:
tm.start()
iterate(func(b))
zyx = func(b)
memory = tm.get_traced_memory()[1]
tm.stop()
time = min(repeat(lambda: iterate(func(b)), number=1))
print(f'{round(time*1e3)} ms ', memory, 'bytes ', func.__name__)
print()
Given an iterable of an iterable of an iterable it_it_it
(i.e. a lazy representation of 3d array) you can lazily transpose dimensions 0
and 1
by zip(*it_it_it)
and dimensions 1
and 2
by map(lambda it_it: zip(*it_it), it_it_it)
.
However, the last combination (0
and 2
) is trickier. It seems you must full evaluate the outer two iterators before yielding anything and the type yielded must be List[List]
not a lazy Iterable[Iterable]
; the inner most iterator is the only one that can be lazily evaluated (i.e. Iterable[List[List]]
is the best you can do).
I’m going to give an answer but am interested in a more elegant answer.
Aside:
I’m interested in this question for understanding the problem with statically typed iterators i.e. rust and c++. Do you make sure to set up your data so you never have to do this operation. Is the best thing to do is just fully evaluate the iterators to List[List[List]]
and then transpose c style.
import itertools
def transpose_(it_it_it):
blocks = [[iter(it) for it in it_it] for it_it in it_it_it]
it = iter(itertools.cycle(itertools.chain.from_iterable(zip(*blocks))))
while True:
try:
yield [[next(next(it)) for _ in range(len(blocks))]
for _ in range(len(blocks[0]))]
except StopIteration:
break
Here’s the code to test it
You can’t really avoid the materialization you’re trying to avoid.
Consider iterating over the transposed result you want:
for iter1 in transpose_layers_0_and_2(iterator):
for iter2 in iter1:
for iter3 in iter2:
pass
At the end of the first iteration of the outer loop, you’ve accessed every element of every sub-sub-iterator of the first sub-iterator of transpose_layers_0_and_2(iterator)
.
In the original pre-transpose iterator, these elements come from the first element of every sub-sub-iterator of the original iterator
.
That means that by the time the first iteration of the outer loop is complete, you must have materialized the entire first two layers of the original iterator to produce every sub-sub-iterator. There’s no way around it. Plus, you’ve only used one element of each sub-sub-iterator, so you still have to retain all those sub-sub-iterators in memory to produce the remaining elements. You can’t discard them yet.
Solution
def transpose_(it_it_it):
return zip(*map(zip, *it_it_it))
Derivation
My first version was this, and then I just minified it. I first swap the outer two dimensions, then the inner two, then the outer two again. Using your ways of doing that. Note the variable names, they reflect the dimensions / swaps:
def transpose_(xyz):
yxz = zip(*xyz)
yzx = map(lambda xz: zip(*xz), yxz)
zyx = zip(*yzx)
return zyx
Iteration / exhaustion visualization
For visualization, I turned the lists of the 3D input into iterators that report when they’ve been exhausted, and iterate the transposed data, printing its structure and values. We can see the outer two dimensions get exhausted up-front, and the innermost dimension is exhausted only at the end:
X-iterator exhausted
Y-iterator exhausted
Y-iterator exhausted
Y-iterator exhausted
[
(
0
8
16
)
(
4
12
20
)
]
[
(
1
9
17
)
(
5
13
21
)
]
[
(
2
10
18
)
(
6
14
22
)
]
[
(
3
11
19
)
(
7
15
23
)
]
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Code:
import numpy
def transpose_(it_it_it):
return zip(*map(zip, *it_it_it))
# Versions of zip and map that exhaust all inputs
_zip = zip
def zip(*its):
yield from _zip(*its)
for it in its[1:]:
next(it, None)
_map = map
def map(f, *its):
yield from _map(f, *its)
for it in its[1:]:
next(it, None)
# Iterators that report when they've been exhausted
def X(it):
yield from map(Y, it)
print('X-iterator exhausted')
def Y(it):
yield from map(Z, it)
print('Y-iterator exhausted')
def Z(it):
yield from it
print('Z-iterator exhausted')
# Test data
a = numpy.arange(3*2*4).reshape((3, 2, 4))
b = X(a.tolist())
# Iterate the transposed data
zyx = transpose_(b)
for yx in zyx:
print('[')
for x in yx:
print(' (')
for value in x:
print(' ', value)
print(' )')
print(']')
Benchmark
Time and memory of creating and iterating the transposed data, given data of size 100×100×100 (three attempts):
317 ms 845540 bytes transpose_Tom
117 ms 825400 bytes transpose_Kelly
351 ms 840144 bytes transpose_Tom
127 ms 824984 bytes transpose_Kelly
324 ms 844120 bytes transpose_Tom
116 ms 824984 bytes transpose_Kelly
Code:
import numpy
import tracemalloc as tm
from timeit import repeat
import itertools
def transpose_Tom(it_it_it):
blocks = [[iter(it) for it in it_it] for it_it in it_it_it]
it = iter(itertools.cycle(itertools.chain.from_iterable(zip(*blocks))))
while True:
try:
yield [[next(next(it)) for _ in range(len(blocks))]
for _ in range(len(blocks[0]))]
except StopIteration:
break
def transpose_Kelly(it_it_it):
return zip(*map(zip, *it_it_it))
def iterate(iii):
for ii in iii:
for i in ii:
for _ in i:
pass
n = 100
a = numpy.arange(n**3).reshape((n, n, n))
b = a.tolist()
for _ in range(3):
for func in transpose_Tom, transpose_Kelly:
tm.start()
iterate(func(b))
zyx = func(b)
memory = tm.get_traced_memory()[1]
tm.stop()
time = min(repeat(lambda: iterate(func(b)), number=1))
print(f'{round(time*1e3)} ms ', memory, 'bytes ', func.__name__)
print()