Python Aiohttp Asyncio: how to create delays between each task

Question:

Problem I’m trying to solve:
I’m making many api requests to a server. I’m trying to create delays bewtween async api calls to comply with the server’s rate limit policy.

What I want it to do
I want it to behave like this:

  1. Make api request #1
  2. wait 0.1 seconds
  3. Make api request #2
  4. wait 0.1 seconds
    … and so on …
  5. repeat until all requests are made
  6. gather the responses and return the results in one object (results)

Issue:
When when I introduced asyncio.sleep() or time.sleep() in the code, it still made api requests almost instantaneously. It seemed to delay the execution of print(), but not the api requests. I suspect that I have to create the delays within the loop, not at the fetch_one() or fetch_all(), but couldn’t figure out how to do so.

Code block:

async def fetch_all(loop, urls, delay): 
    results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
    return results

async def fetch_one(loop, url, delay):

    #time.sleep(delay)
    #asyncio.sleep(delay)

    async with aiohttp.ClientSession(loop=loop) as session:
        async with session.get(url, ssl=SSLContext()) as resp:
            # print("An api call to ", url, " is made at ", time.time())
            # print(resp)
            return await resp

delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))

Versions I'm using: 
python                    3.8.5
aiohttp                   3.7.4
asyncio                   3.4.3

I would appreciate any tips on guiding me to the right direction!

Asked By: Aaron Ahn

||

Answers:

When you use asyncio.gather you run all fetch_one coroutines concurrently. All of them wait for delay together, than make API calls instantaneously together.

To solve the issue, you should either await fetch_one in one by one in fetch_all or to use Semaphore to signalize next shouldn’t start before previous is done.

Here’s the idea:

import asyncio

_sem = asyncio.Semaphore(1)


async def fetch_all(loop, urls, delay): 
    results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
    return results

async def fetch_one(loop, url, delay):

    async with _sem:  # next coroutine(s) will stuck here until the previous is done
        await asyncio.sleep(delay)

        async with aiohttp.ClientSession(loop=loop) as session:
            async with session.get(url, ssl=SSLContext()) as resp:
                # print("An api call to ", url, " is made at ", time.time())
                # print(resp)
                return await resp

delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
Answered By: Mikhail Gerasimov

The call to asyncio.gather will launch all requests "simultaneously" – and on the other hand, if you would simply use a lock or await for each task, you would not gain anything from using parallelism at all.

The simplest thing to do, if you know the rate you can issue the requests, is simply to increase the asynchronous pause before each request in sucession – a simple global variable can do that:


next_delay = 0.1

async def fetch_all(loop, urls, delay): 
    results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
    return results

async def fetch_one(loop, url, delay):
    global next_delay
    
    next_delay += delay
    await asyncio.sleep(next_delay)

    async with aiohttp.ClientSession(loop=loop) as session:
        async with session.get(url, ssl=SSLContext()) as resp:
            # print("An api call to ", url, " is made at ", time.time())
            # print(resp)
            return await resp

delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))

Now, if you want like, issue 5 requests and then issue the next 5, you could use a synchronization primitive like asyncio.Condition, using its wait_for on an expression which checks how many api calls are active:

active_calls = 0

MAX_CALLS = 5

async def fetch_all(loop, urls, delay): 
    event = asyncio.Event()
    event.set()
    results = await asyncio.gather(*[fetch_one(loop, url, delay, event) for url in urls], return_exceptions=True)
    return results

async def fetch_one(loop, url, delay, cond):
    global active_calls
    
    active_calls += 1
    if active_calls > MAX_CALLS:
        event.clear()
        
    await event.wait()
    
    try:
        async with aiohttp.ClientSession(loop=loop) as session:
            async with session.get(url, ssl=SSLContext()) as resp:
                # print("An api call to ", url, " is made at ", time.time())
                # print(resp)
                return await resp
    finally:
        active_calls -= 1
    if active_calls == 0:
        event.set()
        

urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))

For both examples, should your task avoid global variables in the design (actually,these are "module" variables) – you could either move all funtions to a class, and work on an instance, and promote the global variables to instance attributes, or use a mutable container, such as a list for holding the active_calls value in its first item, and pass that as a parameter.

Answered By: jsbueno