Lazily transpose dimensions 0 and 2 in iterator model

Question:

Given an iterable of an iterable of an iterable it_it_it (i.e. a lazy representation of 3d array) you can lazily transpose dimensions 0 and 1 by zip(*it_it_it) and dimensions 1 and 2 by map(lambda it_it: zip(*it_it), it_it_it).

However, the last combination (0 and 2) is trickier. It seems you must full evaluate the outer two iterators before yielding anything and the type yielded must be List[List] not a lazy Iterable[Iterable]; the inner most iterator is the only one that can be lazily evaluated (i.e. Iterable[List[List]] is the best you can do).

I’m going to give an answer but am interested in a more elegant answer.

Aside:

I’m interested in this question for understanding the problem with statically typed iterators i.e. rust and c++. Do you make sure to set up your data so you never have to do this operation. Is the best thing to do is just fully evaluate the iterators to List[List[List]] and then transpose c style.

Asked By: Tom Huntington

||

Answers:

import itertools

def transpose_(it_it_it):
    blocks = [[iter(it) for it in it_it] for it_it in it_it_it]
    it = iter(itertools.cycle(itertools.chain.from_iterable(zip(*blocks))))
    while True:
        try:
            yield [[next(next(it)) for _ in range(len(blocks))]
                                   for _ in range(len(blocks[0]))]
        except StopIteration:
            break

Here’s the code to test it

Answered By: Tom Huntington

You can’t really avoid the materialization you’re trying to avoid.

Consider iterating over the transposed result you want:

for iter1 in transpose_layers_0_and_2(iterator):
    for iter2 in iter1:
        for iter3 in iter2:
            pass

At the end of the first iteration of the outer loop, you’ve accessed every element of every sub-sub-iterator of the first sub-iterator of transpose_layers_0_and_2(iterator).

In the original pre-transpose iterator, these elements come from the first element of every sub-sub-iterator of the original iterator.

That means that by the time the first iteration of the outer loop is complete, you must have materialized the entire first two layers of the original iterator to produce every sub-sub-iterator. There’s no way around it. Plus, you’ve only used one element of each sub-sub-iterator, so you still have to retain all those sub-sub-iterators in memory to produce the remaining elements. You can’t discard them yet.

Answered By: user2357112

Solution

def transpose_(it_it_it):
    return zip(*map(zip, *it_it_it))

Attempt This Online!

Derivation

My first version was this, and then I just minified it. I first swap the outer two dimensions, then the inner two, then the outer two again. Using your ways of doing that. Note the variable names, they reflect the dimensions / swaps:

def transpose_(xyz):
    yxz = zip(*xyz)
    yzx = map(lambda xz: zip(*xz), yxz)
    zyx = zip(*yzx)
    return zyx

Iteration / exhaustion visualization

For visualization, I turned the lists of the 3D input into iterators that report when they’ve been exhausted, and iterate the transposed data, printing its structure and values. We can see the outer two dimensions get exhausted up-front, and the innermost dimension is exhausted only at the end:

X-iterator exhausted
Y-iterator exhausted
Y-iterator exhausted
Y-iterator exhausted
[
  (
    0
    8
    16
  )
  (
    4
    12
    20
  )
]
[
  (
    1
    9
    17
  )
  (
    5
    13
    21
  )
]
[
  (
    2
    10
    18
  )
  (
    6
    14
    22
  )
]
[
  (
    3
    11
    19
  )
  (
    7
    15
    23
  )
]
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted
Z-iterator exhausted

Code:

import numpy

def transpose_(it_it_it):
    return zip(*map(zip, *it_it_it))

# Versions of zip and map that exhaust all inputs
_zip = zip
def zip(*its):
    yield from _zip(*its)
    for it in its[1:]:
        next(it, None)
_map = map
def map(f, *its):
    yield from _map(f, *its)
    for it in its[1:]:
        next(it, None)

# Iterators that report when they've been exhausted
def X(it):
    yield from map(Y, it)
    print('X-iterator exhausted')
def Y(it):
    yield from map(Z, it)
    print('Y-iterator exhausted')
def Z(it):
    yield from it
    print('Z-iterator exhausted')

# Test data
a = numpy.arange(3*2*4).reshape((3, 2, 4))
b = X(a.tolist())

# Iterate the transposed data
zyx = transpose_(b)
for yx in zyx:
    print('[')
    for x in yx:
        print('  (')
        for value in x:
            print('   ', value)
        print('  )')
    print(']')

Attempt This Online!

Benchmark

Time and memory of creating and iterating the transposed data, given data of size 100×100×100 (three attempts):

317 ms  845540 bytes  transpose_Tom
117 ms  825400 bytes  transpose_Kelly

351 ms  840144 bytes  transpose_Tom
127 ms  824984 bytes  transpose_Kelly

324 ms  844120 bytes  transpose_Tom
116 ms  824984 bytes  transpose_Kelly

Code:

import numpy
import tracemalloc as tm
from timeit import repeat
import itertools

def transpose_Tom(it_it_it):
    blocks = [[iter(it) for it in it_it] for it_it in it_it_it]
    it = iter(itertools.cycle(itertools.chain.from_iterable(zip(*blocks))))
    while True:
        try:
            yield [[next(next(it)) for _ in range(len(blocks))]
                                   for _ in range(len(blocks[0]))]
        except StopIteration:
            break

def transpose_Kelly(it_it_it):
    return zip(*map(zip, *it_it_it))

def iterate(iii):
    for ii in iii:
        for i in ii:
            for _ in i:
                pass
n = 100
a = numpy.arange(n**3).reshape((n, n, n))
b = a.tolist()

for _ in range(3):
    for func in transpose_Tom, transpose_Kelly:

        tm.start()
        iterate(func(b))
        zyx = func(b)
        memory = tm.get_traced_memory()[1]
        tm.stop()

        time = min(repeat(lambda: iterate(func(b)), number=1))
        
        print(f'{round(time*1e3)} ms ', memory, 'bytes ', func.__name__)
    print()

Attempt This Online!

Answered By: Kelly Bundy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.