How to use `async for` in Python?
Question:
I mean what do I get from using async for
. Here is the code I write with async for
, AIter(10)
could be replaced with get_range()
.
But the code runs like sync not async.
import asyncio
async def get_range():
for i in range(10):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
yield i
class AIter:
def __init__(self, N):
self.i = 0
self.N = N
def __aiter__(self):
return self
async def __anext__(self):
i = self.i
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
if i >= self.N:
raise StopAsyncIteration
self.i += 1
return i
async def main():
async for p in AIter(10):
print(f"finally {p}")
if __name__ == "__main__":
asyncio.run(main())
The result I excepted should be :
start 1
start 2
start 3
...
end 1
end 2
...
finally 1
finally 2
...
However, the real result is:
start 0
end 0
finally 0
start 1
end 1
finally 1
start 2
end 2
I know I could get the excepted result by using asyncio.gather
or asyncio.wait
.
But it is hard for me to understand what I got by use async for
here instead of simple for
.
What is the right way to use async for
if I want to loop over several Feature
object and use them as soon as one is finished. For example:
async for f in feature_objects:
data = await f
with open("file", "w") as fi:
fi.write()
Answers:
But it is hard for me to understand what I got by use async for
here instead of simple for
.
The underlying misunderstanding is expecting async for
to automatically parallelize the iteration. It doesn’t do that, it simply allows sequential iteration over an async source. For example, you can use async for
to iterate over lines coming from a TCP stream, messages from a websocket, or database records from an async DB driver.
None of the above would work with an ordinary for
, at least not without blocking the event loop. This is because for
calls __next__
as a blocking function and doesn’t await its result. You cannot manually await
elements obtained by for
because for
expects __next__
to signal the end of iteration by raising StopIteration
. If __next__
is a coroutine, the StopIteration
exception won’t be visible before awaiting it. This is why async for
was introduced, not just in Python, but also in other languages with async/await and generalized for
.
If you want to run the loop iterations in parallel, you need to start them as parallel coroutines and use asyncio.as_completed
or equivalent to retrieve their results as they come:
async def x(i):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
return i
# run x(0)..x(10) concurrently and process results as they arrive
for f in asyncio.as_completed([x(i) for i in range(10)]):
result = await f
# ... do something with the result ...
If you don’t care about reacting to results immediately as they arrive, but you need them all, you can make it even simpler by using asyncio.gather
:
# run x(0)..x(10) concurrently and process results when all are done
results = await asyncio.gather(*[x(i) for i in range(10)])
(Adding on the accepted answer – for Charlie‘s bounty).
Assuming you want to consume each yielded value concurrently, a straightforward way would be:
import asyncio
async def process_all():
tasks = []
async for obj in my_async_generator:
# Python 3.7+. Use ensure_future for older versions.
task = asyncio.create_task(process_obj(obj))
tasks.append(task)
await asyncio.gather(*tasks)
async def process_obj(obj):
...
Explanation:
Consider the following code, without create_task
:
async def process_all():
async for obj in my_async_generator:
await process_obj(obj))
This is roughly equivalent to:
async def process_all():
obj1 = await my_async_generator.__anext__():
await process_obj(obj1))
obj2 = await my_async_generator.__anext__():
await process_obj(obj1))
...
Basically, the loop cannot continue because its body is blocking. The way to go is to delegate the processing of each iteration to a new asyncio task which will start without blocking the loop. The, gather
wait for all of the tasks – which means, for every iteration to be processed.
Code based on fantastic answer from @matan129, just missing the async generator to make it runnable, once I have that (or if someone wants to contributed it) will finilize this:
import time
import asyncio
async def process_all():
"""
Example where the async for loop allows to loop through concurrently many things without blocking on each individual
iteration but blocks (waits) for all tasks to run.
ref:
- https://stackoverflow.com/questions/56161595/how-to-use-async-for-in-python/72758067#72758067
"""
tasks = []
async for obj in my_async_generator:
# Python 3.7+. Use ensure_future for older versions.
task = asyncio.create_task(process_obj(obj)) # concurrently dispatches a coroutine to be executed.
tasks.append(task)
await asyncio.gather(*tasks)
async def process_obj(obj):
await asyncio.sleep(5) # expensive IO
if __name__ == '__main__':
# - test asyncio
s = time.perf_counter()
asyncio.run(process_all())
# - print stats
elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
print('Success, done!a')
I mean what do I get from using async for
. Here is the code I write with async for
, AIter(10)
could be replaced with get_range()
.
But the code runs like sync not async.
import asyncio
async def get_range():
for i in range(10):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
yield i
class AIter:
def __init__(self, N):
self.i = 0
self.N = N
def __aiter__(self):
return self
async def __anext__(self):
i = self.i
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
if i >= self.N:
raise StopAsyncIteration
self.i += 1
return i
async def main():
async for p in AIter(10):
print(f"finally {p}")
if __name__ == "__main__":
asyncio.run(main())
The result I excepted should be :
start 1
start 2
start 3
...
end 1
end 2
...
finally 1
finally 2
...
However, the real result is:
start 0
end 0
finally 0
start 1
end 1
finally 1
start 2
end 2
I know I could get the excepted result by using asyncio.gather
or asyncio.wait
.
But it is hard for me to understand what I got by use async for
here instead of simple for
.
What is the right way to use async for
if I want to loop over several Feature
object and use them as soon as one is finished. For example:
async for f in feature_objects:
data = await f
with open("file", "w") as fi:
fi.write()
But it is hard for me to understand what I got by use
async for
here instead of simplefor
.
The underlying misunderstanding is expecting async for
to automatically parallelize the iteration. It doesn’t do that, it simply allows sequential iteration over an async source. For example, you can use async for
to iterate over lines coming from a TCP stream, messages from a websocket, or database records from an async DB driver.
None of the above would work with an ordinary for
, at least not without blocking the event loop. This is because for
calls __next__
as a blocking function and doesn’t await its result. You cannot manually await
elements obtained by for
because for
expects __next__
to signal the end of iteration by raising StopIteration
. If __next__
is a coroutine, the StopIteration
exception won’t be visible before awaiting it. This is why async for
was introduced, not just in Python, but also in other languages with async/await and generalized for
.
If you want to run the loop iterations in parallel, you need to start them as parallel coroutines and use asyncio.as_completed
or equivalent to retrieve their results as they come:
async def x(i):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
return i
# run x(0)..x(10) concurrently and process results as they arrive
for f in asyncio.as_completed([x(i) for i in range(10)]):
result = await f
# ... do something with the result ...
If you don’t care about reacting to results immediately as they arrive, but you need them all, you can make it even simpler by using asyncio.gather
:
# run x(0)..x(10) concurrently and process results when all are done
results = await asyncio.gather(*[x(i) for i in range(10)])
(Adding on the accepted answer – for Charlie‘s bounty).
Assuming you want to consume each yielded value concurrently, a straightforward way would be:
import asyncio
async def process_all():
tasks = []
async for obj in my_async_generator:
# Python 3.7+. Use ensure_future for older versions.
task = asyncio.create_task(process_obj(obj))
tasks.append(task)
await asyncio.gather(*tasks)
async def process_obj(obj):
...
Explanation:
Consider the following code, without create_task
:
async def process_all():
async for obj in my_async_generator:
await process_obj(obj))
This is roughly equivalent to:
async def process_all():
obj1 = await my_async_generator.__anext__():
await process_obj(obj1))
obj2 = await my_async_generator.__anext__():
await process_obj(obj1))
...
Basically, the loop cannot continue because its body is blocking. The way to go is to delegate the processing of each iteration to a new asyncio task which will start without blocking the loop. The, gather
wait for all of the tasks – which means, for every iteration to be processed.
Code based on fantastic answer from @matan129, just missing the async generator to make it runnable, once I have that (or if someone wants to contributed it) will finilize this:
import time
import asyncio
async def process_all():
"""
Example where the async for loop allows to loop through concurrently many things without blocking on each individual
iteration but blocks (waits) for all tasks to run.
ref:
- https://stackoverflow.com/questions/56161595/how-to-use-async-for-in-python/72758067#72758067
"""
tasks = []
async for obj in my_async_generator:
# Python 3.7+. Use ensure_future for older versions.
task = asyncio.create_task(process_obj(obj)) # concurrently dispatches a coroutine to be executed.
tasks.append(task)
await asyncio.gather(*tasks)
async def process_obj(obj):
await asyncio.sleep(5) # expensive IO
if __name__ == '__main__':
# - test asyncio
s = time.perf_counter()
asyncio.run(process_all())
# - print stats
elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
print('Success, done!a')