In Python, what is the difference between `async for x in async_iterator` and `for x in await async_iterator`?

Question:

The subject contains the whole idea. I came accross code sample where it shows something like:

async for item in getItems():
    await item.process()

And others where the code is:

for item in await getItems():
    await item.process()

Is there a notable difference in these two approaches?

Asked By: Cyril N.

||

Answers:

Those are completely different.

This for item in await getItems() won’t work (will throw an error) if getItems() is an asynchronous iterator or asynchronous generator, it may be used only if getItems is a coroutine which, in your case, is expected to return a sequence object (simple iterable).

async for is a conventional (and pythonic) way for asynchronous iterations over async iterator/generator.

Answered By: RomanPerekhrest

TL;DR

While both of them could theoretically work with the same object (without causing an error), they most likely do not. In general those two notations are not equivalent at all, but invoke entirely different protocols and are applied to very distinct use cases.


Different protocols

Iterable

To understand the difference, you first need to understand the concept of an iterable.

Abstractly speaking, an object is iterable, if it implements the __iter__ method or (less common for iteration) a sequence-like __getitem__ method.

Practically speaking, an object is iterable, if you can use it in a for-loop, so for _ in iterable. A for-loop implicitly invokes the __iter__ method of the iterable and expects it to return an iterator, which implements the __next__ method. That method is called at the start of each iteration in the for-loop and its return value is what is assigned to the loop variable.

Asynchronous iterable

The async-world introduced a variation of that, namely the asynchronous iterable.

An object is asynchronously iterable, if it implements the __aiter__ method.

Again, practically speaking, an object is asynchronously iterable, if it can be used in an async for-loop, so async for _ in async_iterable. An async for-loop calls the __aiter__ method of the asynchronous iterable and expects it to return an asynchronous iterator, which implements the __anext__ coroutine method. That method is awaited at the start of each iteration of the async for-loop.

Awaitable

Typically speaking, an asynchronous iterable is not awaitable, i.e. it is not a coroutine and it does not implement an __await__ method and vice versa. Although they are not necessarily mutually exclusive. You could design an object that is both awaitable by itself and also (asynchronously) iterable, though that seems like a very strange design.

(Asynchronous) Iterator

Just to be very clear in the terminology used, the iterator is a subtype of the iterable. Meaning an iterator also implements the iterable protocol by providing an __iter__ method, but it also provides the __next__ method. Analogously, the asynchronous iterator is a subtype of the asynchronous iterable because it implements the __aiter__ method, but also provides the __anext__ coroutine method.

You do not need the object to be an iterator for it to be used in a for-loop, you need it to return an iterator. The fact that you can use an (asynchronous) iterator in a (async) for-loop is because it is also an (asynchronous) iterable. It is just rare for something to be an iterable but not an iterator. In most cases the object will be both (i.e. the latter).


Inferences from your example

async for _ in get_items()

That code implies that whatever is returned by the get_items function is an asynchronous iterable.

Note that get_items is just a normal non-async function, but the object it returns implements the asynchronous iterable protocol. That means we could write the following instead:

async_iterable = get_items()
async for item in async_iterable:
    ...

for _ in await get_items()

Whereas this snippet implies that get_items is in fact a coroutine function (i.e. a callable returning an awaitable) and the return value of that coroutine is a normal iterable.

Note that we know for certain that the object returned by the get_items coroutine is a normal iterable because otherwise the regular for-loop would not work with it. The equivalent code would be:

iterable = await get_items()
for item in iterable:
    ...

Implications

Another implication of those code snippets is that in the first one the function (returning the asynchronous iterator) is non-asynchronous, i.e. calling it will not yield control to the event loop, whereas each iteration of the async for-loop is asynchronous (and thus will allow context switches).

Conversely, in the second one the function returning the normal iterator is an asynchronous call, but all of the iterations (the calls to __next__) are non-asynchronous.

Key difference

The practical takeaway should be that those two snippets you showed are never equivalent. The main reason is that get_items either is or is not a coroutine function. If it is not, you cannot do await get_items(). But whether or not you can do async for or for depends on whatever is returned by get_items.


Possible combinations

For the sake of completion, it should be noted that combinations of the aforementioned protocols are entirely feasible, although not all too common. Consider the following example:

from __future__ import annotations

class Foo:
    x = 0

    def __iter__(self) -> Foo:
        return self

    def __next__(self) -> int:
        if self.x >= 2:
            raise StopIteration
        self.x += 1
        return self.x

    def __aiter__(self) -> Foo:
        return self

    async def __anext__(self) -> int:
        if self.x >= 3:
            raise StopAsyncIteration
        self.x += 1
        return self.x * 10


async def main() -> None:
    for i in Foo():
        print(i)
    async for i in Foo():
        print(i)

if __name__ == "__main__":
    from asyncio import run
    run(main())

In this example, Foo implements four distinct protocols:

  • iterable (def __iter__)
  • iterator (iterable + def __next__)
  • asynchronous iterable (def __aiter__)
  • asynchronous iterator (asynchronous iterable + async def __anext__)

Running the main coroutine gives the following output:

1
2
10
20
30

This shows that objects can absolutely be all those things at the same time. Since Foo is both a synchronous and an asynchronous iterable, we could write two functions — one coroutine, one regular — that each returns an instance of Foo and then replicate your example a bit:

from collections.abc import AsyncIterable, Iterable


def get_items_sync() -> AsyncIterable[int]:
    return Foo()


async def get_items_async() -> Iterable[int]:
    return Foo()


async def main() -> None:
    async for i in get_items_sync():
        print(i)
    for i in await get_items_async():
        print(i)
    async for i in await get_items_async():
        print(i)


if __name__ == "__main__":
    from asyncio import run
    run(main())

Output:

10
20
30
1
2
10
20
30

This illustrates very clearly that the only thing determining which of our Foo methods is called (__next__ or __anext__) is whether we use a for-loop or an async for-loop.

The for-loop will always call the __next__ method at least once and continue calling it for each iteration, until it intercepts a StopIteration exception.

The async for-loop will always await the __anext__ coroutine at least once and continue calling and awaiting it for each subsequent iteration, until it intercepts a StopAsyncIteration exception.

Answered By: Daniil Fajnberg