Why does asyncio.create_task and asyncio.ensure_future behave differently when creating httpx tasks for gather?

Question:

I found an async httpx example where ensure_future works but create_task doesn’t, but I can’t figure out why. As I’ve understood that create_task is the preferred approach, I’m wondering what’s happening and how I may solve the problem.

I’ve been using an async httpx example at https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-httpx-and-asyncio:

import asyncio
import httpx
import time

start_time = time.time()

async def get_pokemon(client, url):
        resp = await client.get(url)
        pokemon = resp.json()

        return pokemon['name']
    
async def main():

    async with httpx.AsyncClient() as client:

        tasks = []
        for number in range(1, 151):
            url = f'https://pokeapi.co/api/v2/pokemon/{number}'
            tasks.append(asyncio.ensure_future(get_pokemon(client, url)))

        original_pokemon = await asyncio.gather(*tasks)
        for pokemon in original_pokemon:
            print(pokemon)

asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))

When run verbatim, the code produces the intended result (a list of Pokemon in less than a second). However, replacing the asyncio.ensure_future with asyncio.create_task instead leads to a long wait (which seems to be related to a DNS lookup timing out) and then exceptions, the first one being:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/anyio/_core/_sockets.py", line 186, in connect_tcp
    addr_obj = ip_address(remote_host)
  File "/usr/lib/python3.10/ipaddress.py", line 54, in ip_address
    raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
ValueError: 'pokeapi.co' does not appear to be an IPv4 or IPv6 address

Reducing the range maximum (to 70 on my computer) makes the problem disappear.

I understand https://stackoverflow.com/a/36415477/ as saying that ensure_future and create_task act similarly when given coroutines unless there’s a custom event loop, and that create_task is recommended.

If so, why does one of the approaches work while the other fails?

I’m using Python 3.10.5 and httpx 0.23.0.

Asked By: Lesser Cormorant

||

Answers:

Here is the source code from the standard Python library (Python 3.10, module asyncio.tasks.py):

def ensure_future(coro_or_future, *, loop=None):
    """Wrap a coroutine or an awaitable in a future.

    If the argument is a Future, it is returned directly.
    """
    return _ensure_future(coro_or_future, loop=loop)


def _ensure_future(coro_or_future, *, loop=None):
    if futures.isfuture(coro_or_future):
        if loop is not None and loop is not futures._get_loop(coro_or_future):
            raise ValueError('The future belongs to a different loop than '
                            'the one specified as the loop argument')
        return coro_or_future
    called_wrap_awaitable = False
    if not coroutines.iscoroutine(coro_or_future):
        if inspect.isawaitable(coro_or_future):
            coro_or_future = _wrap_awaitable(coro_or_future)
            called_wrap_awaitable = True
        else:
            raise TypeError('An asyncio.Future, a coroutine or an awaitable '
                            'is required')

    if loop is None:
        loop = events._get_event_loop(stacklevel=4)
    try:
        return loop.create_task(coro_or_future)
    except RuntimeError: 
        if not called_wrap_awaitable:
            coro_or_future.close()
        raise

As you can see, ensure_future does some type checking first. Then it gets an event loop if the loop keyword is not defined. Then it calls create_task and returns the result.

If you see a difference, the only possibility is that getting an event loop is somehow causing it. This doesn’t solve your issue but it might help to direct your debugging efforts.

Answered By: Paul Cornelius

After more debugging, I’ve found out that the problem lies elsewhere.

It appears that httpx doesn’t use DNS precaching, so when asked to connect to the same host a bunch of times at once, it’ll do a large number of DNS lookups. In turn, that caused the DNS server to fail to respond to requests some of the time.

As luck would have it, even though I tested many times, the request storm happened to make the DNS fail exactly when I was using create_task but not when I was using ensure_future.

In short, due to Murphy’s law I was mistaking a nondeterministic problem for a deterministic one. However, it seems that httpx can in general be a bit fickle when it comes to DNS requests, for instance as reported at https://github.com/encode/httpx/discussions/2321.

Answered By: Lesser Cormorant
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.