Why does asyncio.create_task and asyncio.ensure_future behave differently when creating httpx tasks for gather?
Question:
I found an async httpx example where ensure_future
works but create_task
doesn’t, but I can’t figure out why. As I’ve understood that create_task
is the preferred approach, I’m wondering what’s happening and how I may solve the problem.
I’ve been using an async httpx example at https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-httpx-and-asyncio:
import asyncio
import httpx
import time
start_time = time.time()
async def get_pokemon(client, url):
resp = await client.get(url)
pokemon = resp.json()
return pokemon['name']
async def main():
async with httpx.AsyncClient() as client:
tasks = []
for number in range(1, 151):
url = f'https://pokeapi.co/api/v2/pokemon/{number}'
tasks.append(asyncio.ensure_future(get_pokemon(client, url)))
original_pokemon = await asyncio.gather(*tasks)
for pokemon in original_pokemon:
print(pokemon)
asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))
When run verbatim, the code produces the intended result (a list of Pokemon in less than a second). However, replacing the asyncio.ensure_future
with asyncio.create_task
instead leads to a long wait (which seems to be related to a DNS lookup timing out) and then exceptions, the first one being:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/anyio/_core/_sockets.py", line 186, in connect_tcp
addr_obj = ip_address(remote_host)
File "/usr/lib/python3.10/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'pokeapi.co' does not appear to be an IPv4 or IPv6 address
Reducing the range maximum (to 70 on my computer) makes the problem disappear.
I understand https://stackoverflow.com/a/36415477/ as saying that ensure_future
and create_task
act similarly when given coroutines unless there’s a custom event loop, and that create_task
is recommended.
If so, why does one of the approaches work while the other fails?
I’m using Python 3.10.5 and httpx 0.23.0.
Answers:
Here is the source code from the standard Python library (Python 3.10, module asyncio.tasks.py):
def ensure_future(coro_or_future, *, loop=None):
"""Wrap a coroutine or an awaitable in a future.
If the argument is a Future, it is returned directly.
"""
return _ensure_future(coro_or_future, loop=loop)
def _ensure_future(coro_or_future, *, loop=None):
if futures.isfuture(coro_or_future):
if loop is not None and loop is not futures._get_loop(coro_or_future):
raise ValueError('The future belongs to a different loop than '
'the one specified as the loop argument')
return coro_or_future
called_wrap_awaitable = False
if not coroutines.iscoroutine(coro_or_future):
if inspect.isawaitable(coro_or_future):
coro_or_future = _wrap_awaitable(coro_or_future)
called_wrap_awaitable = True
else:
raise TypeError('An asyncio.Future, a coroutine or an awaitable '
'is required')
if loop is None:
loop = events._get_event_loop(stacklevel=4)
try:
return loop.create_task(coro_or_future)
except RuntimeError:
if not called_wrap_awaitable:
coro_or_future.close()
raise
As you can see, ensure_future does some type checking first. Then it gets an event loop if the loop keyword is not defined. Then it calls create_task and returns the result.
If you see a difference, the only possibility is that getting an event loop is somehow causing it. This doesn’t solve your issue but it might help to direct your debugging efforts.
After more debugging, I’ve found out that the problem lies elsewhere.
It appears that httpx doesn’t use DNS precaching, so when asked to connect to the same host a bunch of times at once, it’ll do a large number of DNS lookups. In turn, that caused the DNS server to fail to respond to requests some of the time.
As luck would have it, even though I tested many times, the request storm happened to make the DNS fail exactly when I was using create_task
but not when I was using ensure_future
.
In short, due to Murphy’s law I was mistaking a nondeterministic problem for a deterministic one. However, it seems that httpx can in general be a bit fickle when it comes to DNS requests, for instance as reported at https://github.com/encode/httpx/discussions/2321.
I found an async httpx example where ensure_future
works but create_task
doesn’t, but I can’t figure out why. As I’ve understood that create_task
is the preferred approach, I’m wondering what’s happening and how I may solve the problem.
I’ve been using an async httpx example at https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-httpx-and-asyncio:
import asyncio
import httpx
import time
start_time = time.time()
async def get_pokemon(client, url):
resp = await client.get(url)
pokemon = resp.json()
return pokemon['name']
async def main():
async with httpx.AsyncClient() as client:
tasks = []
for number in range(1, 151):
url = f'https://pokeapi.co/api/v2/pokemon/{number}'
tasks.append(asyncio.ensure_future(get_pokemon(client, url)))
original_pokemon = await asyncio.gather(*tasks)
for pokemon in original_pokemon:
print(pokemon)
asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))
When run verbatim, the code produces the intended result (a list of Pokemon in less than a second). However, replacing the asyncio.ensure_future
with asyncio.create_task
instead leads to a long wait (which seems to be related to a DNS lookup timing out) and then exceptions, the first one being:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/anyio/_core/_sockets.py", line 186, in connect_tcp
addr_obj = ip_address(remote_host)
File "/usr/lib/python3.10/ipaddress.py", line 54, in ip_address
raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'pokeapi.co' does not appear to be an IPv4 or IPv6 address
Reducing the range maximum (to 70 on my computer) makes the problem disappear.
I understand https://stackoverflow.com/a/36415477/ as saying that ensure_future
and create_task
act similarly when given coroutines unless there’s a custom event loop, and that create_task
is recommended.
If so, why does one of the approaches work while the other fails?
I’m using Python 3.10.5 and httpx 0.23.0.
Here is the source code from the standard Python library (Python 3.10, module asyncio.tasks.py):
def ensure_future(coro_or_future, *, loop=None):
"""Wrap a coroutine or an awaitable in a future.
If the argument is a Future, it is returned directly.
"""
return _ensure_future(coro_or_future, loop=loop)
def _ensure_future(coro_or_future, *, loop=None):
if futures.isfuture(coro_or_future):
if loop is not None and loop is not futures._get_loop(coro_or_future):
raise ValueError('The future belongs to a different loop than '
'the one specified as the loop argument')
return coro_or_future
called_wrap_awaitable = False
if not coroutines.iscoroutine(coro_or_future):
if inspect.isawaitable(coro_or_future):
coro_or_future = _wrap_awaitable(coro_or_future)
called_wrap_awaitable = True
else:
raise TypeError('An asyncio.Future, a coroutine or an awaitable '
'is required')
if loop is None:
loop = events._get_event_loop(stacklevel=4)
try:
return loop.create_task(coro_or_future)
except RuntimeError:
if not called_wrap_awaitable:
coro_or_future.close()
raise
As you can see, ensure_future does some type checking first. Then it gets an event loop if the loop keyword is not defined. Then it calls create_task and returns the result.
If you see a difference, the only possibility is that getting an event loop is somehow causing it. This doesn’t solve your issue but it might help to direct your debugging efforts.
After more debugging, I’ve found out that the problem lies elsewhere.
It appears that httpx doesn’t use DNS precaching, so when asked to connect to the same host a bunch of times at once, it’ll do a large number of DNS lookups. In turn, that caused the DNS server to fail to respond to requests some of the time.
As luck would have it, even though I tested many times, the request storm happened to make the DNS fail exactly when I was using create_task
but not when I was using ensure_future
.
In short, due to Murphy’s law I was mistaking a nondeterministic problem for a deterministic one. However, it seems that httpx can in general be a bit fickle when it comes to DNS requests, for instance as reported at https://github.com/encode/httpx/discussions/2321.