Using asyncio nested_future() and gather() with nested loops

Question

While trying to execute some async functions in parallel, I am always getting an error, which I would like to understand.

Running asyncio on Python 3.5.1, I’m combining aiohttp through the ClientSession and aiopg (asynchronous psycopg2) calls.

The main idea is that I have a loop to read rows from one table, to loop on the rows and to execute in parallel several calls to the function row_loop which will start asynchronously a web request GET and then through another cursor write to the same DB the result for each row.

The ensure_future() and gather() seems to work properly, but only the first DB write goes through, the other result in an exception complaining about another coroutine already waiting.

Traceback (most recent call last):
  File "sandbox/loop.py", line 148, in write_dict
    await cur2.execute(INSERT, (tuple(adict.values()),))
  File "/Users/mrigal/.virtualenvs/py35/lib/python3.5/site-packages/aiopg/cursor.py", line 103, in execute
    waiter = self._conn._create_waiter('cursor.execute')
  File "/Users/mrigal/.virtualenvs/py35/lib/python3.5/site-packages/aiopg/connection.py", line 211, in _create_waiter
    'data' % func_name)
RuntimeError: cursor.execute() called while another coroutine is already waiting for incoming data

The issue could be in the aiopg library or might be that I am registering the loop against main and not against the function where .gather() actually happens. But I could find very little documentation about it…

Without using ensure_future() and gather(), the code is slow since each call is done one after each other. I might not have understood well the purpose of gather() and I might need a real multithreading solution, but I would like to test this intermediary step first.

Here is my actual code:

async def make_get(row, session, spec_country_code=None):
    result = await session.get(URL, country=spec_country_code)
    if not spec_country_code and result.country != row.get('country'):
        return await make_get(row, session, spec_country_code=result.country)
    return result


async def write_dict(conn, adict):
    async with conn.cursor() as cur2:
        await cur2.execute(INSERT_QUERY, (tuple(adict.values()),))


async def row_loop(conn, row, session):
    result = await make_get(row=row, session=session)
    if result.status == 'OVER_QUERY_LIMIT':
        raise OverQueryLimitException()
    else:
        adict = build_adict(row, result)
    await write_dict(conn=conn, adict= adict)
    return result.status


async def write_loop(conn):
    failed_count = 0
    rows = []
    async with aiohttp.ClientSession() as session:
        async with conn.cursor(cursor_factory=DictCursor) as cur
            await cur.execute((SELECT_QUERY))
            async for row in cur:
                # THIS WORKS, BUT I WOULD LIKE TO USE gather()
                # try:
                #     status = await row_loop(conn=conn, row=row, session=session)
                # except OverQueryLimitException:
                #     break
                # if status != 'OK':
                #     failed_count += 1
                rows.append(asyncio.ensure_future(
                    row_loop(conn=conn, row=row, session=session)))
        responses = await asyncio.gather(*rows)
        print(len(responses._children))  # Just a try
    return cur.rownumber, failed_count


def print_result(mode, select_count, failed_count):
    print("Tried to {} {} new entries".format(mode, select_count))
    print("Found {} failed/skipped entries".format(failed_count))


async def insert_new(conn):
    select_count, failed_count = await write_loop(conn=conn)
    print_result('insert', select_count, failed_count)


async def main():
    async with aiopg.create_pool('db_string') as pool:
        async with pool.acquire() as conn:
            await insert_new(conn=conn)


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Asked By: Rmatt

||

Source

Answer 1

(Update: added information and example regarding concurrent cursor execution in the same connection)

aiopg does not allow using two cursors on the same connection concurrently: You must close the cursor before executing a new command:

import asyncio

import aiopg

DSN = 'dbname=aiopg'  # to create, run: `createdb aiopg`


async def test_cursor(conn):
    async with conn.cursor() as cur:
        await cur.execute("SELECT 1")
        async for row in cur:
            print(row)


async def go():
    async with aiopg.create_pool(DSN) as pool:
        async with pool.acquire() as conn:
            print("part 1:")
            await test_cursor(conn)
            await test_cursor(conn)

            print("part 2:")
            await asyncio.wait([test_cursor(conn), test_cursor(conn)])  # fails


loop = asyncio.get_event_loop()
loop.run_until_complete(go())
loop.close()

You might try bypassing this by using more than one connections, however:

Although this does not answer the specific question directly, I would like to suggest a general solution: Database access (in a local/LAN host, for reading or writing), is generally considered very fast, especially when compared to remote (WAN) HTTP requests. Using synchronized db access in this program would probably not make it slower, but will reduce its complexity. Try something in the following fashion, without aiopg:

async def main():
    rows = get_rows_from_db()  # sync
    async with aiohttp.ClientSession() as session:
        tasks = [process_row(session, row) for row in rows]
        responses = await asyncio.gather(*tasks)


async def process_row(session, row):
    async with session.get(url) as response:
        result = await response.text()
    write_db_result(row, result)  # sync db access


loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

Answered By: Udi

Answer 2

Lets say there is a very wealthy large family with a checklist of chores to do for each of their homes:

import asyncio
import time
import functools

start_time = time.time()

actual_chores = ["dishes","laundry","vacuuming","recycling"]
actual_houses = ["lake","mountain","city"]

One person could do them sequentially:

for each_house in actual_houses:

     for each_chore in actual_chores:
         print("started " + each_chore) 
         time.sleep(len(each_chore)/2)
         print("finished " + each_chore)
        
     print("finished " + each_house)
    
print("finished sequential chores after %s minutes " % (time.time() - start_time))

Or you can parallelize the chores within one house:

start_time = time.time()
        
def do_one(chore):
    print("started " + chore) 
    time.sleep(len(chore)/10)
    print("finished " + chore)
    
async def do_all(chores):
    loop = asyncio.get_event_loop()
    futures = [loop.run_in_executor(None,do_one,each_chore) for each_chore in chores]

loop = asyncio.new_event_loop()
loop.run_until_complete(do_all(actual_chores))
print("finished partial parallel chores after %s minutes " % (time.time() - start_time))

Or you could also parallelize all the house and chore combinations:

start_time = time.time()

# synchronous: 
def do_one(chore,house):
    print("started " + chore + " in " + house) 
    time.sleep(len(chore)/10)
    print("finished " + chore + " in " + house)

# asynchronous
async def do_all(chores,houses):
    loop = asyncio.get_event_loop()
    futures = [loop.run_in_executor(None,do_one,each_chore,each_house) for each_chore in chores for each_house in houses]


loop = asyncio.new_event_loop()
loop.run_until_complete(do_all(actual_chores,actual_houses))
print("finished fully parallel chores after %s minutes " % (time.time() - start_time))```

Answered By: Krissy Govertsen

Using asyncio nested_future() and gather() with nested loops

Question:

Answers: