Have you ever get RuntimeError: await wasn't used with future?

Question:

trying to extract data from a website by using asyncio and aiohttp, and AWAIT problem occur in for loop function.

here my script :

async def get_page(session,x):
    async with session.get(f'https://disclosure.bursamalaysia.com/FileAccess/viewHtml?e={x}') as r:
        return await r.text()
    
async def get_all(session, urls):
    tasks =[]
    sem = asyncio.Semaphore(1)
    count = 0
    for x in urls:
        count +=1
        task = asyncio.create_task(get_page(session,x))
        tasks.append(task)
        print(count,'-ID-',x,'|', end=' ')
    results = await asyncio.gather(*task)
    return results

async def main(urls):
    async with aiohttp.ClientSession() as session:
        data = await get_all(session, urls)
        return
        
if __name__ == '__main__':
    urls = titlelink
    results = asyncio.run(main(urls))
    print(results)

for the error, this is what it return when the scraper break :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-5ac99108678c> in <module>
     22 if __name__ == '__main__':
     23     urls = titlelink
---> 24     results = asyncio.run(main(urls))
     25     print(results)

~AppDataLocalProgramsPythonPython38libsite-packagesnest_asyncio.py in run(future, debug)
     30         loop = asyncio.get_event_loop()
     31         loop.set_debug(debug)
---> 32         return loop.run_until_complete(future)
     33 
     34     if sys.version_info >= (3, 6, 0):

~AppDataLocalProgramsPythonPython38libsite-packagesnest_asyncio.py in run_until_complete(self, future)
     68                 raise RuntimeError(
     69                     'Event loop stopped before Future completed.')
---> 70             return f.result()
     71 
     72     def _run_once(self):

~AppDataLocalProgramsPythonPython38libasynciofutures.py in result(self)
    176         self.__log_traceback = False
    177         if self._exception is not None:
--> 178             raise self._exception
    179         return self._result
    180 

~AppDataLocalProgramsPythonPython38libasynciotasks.py in __step(***failed resolving arguments***)
    278                 # We use the `send` method directly, because coroutines
    279                 # don't have `__iter__` and `__next__` methods.
--> 280                 result = coro.send(None)
    281             else:
    282                 result = coro.throw(exc)

<ipython-input-3-5ac99108678c> in main(urls)
     17 async def main(urls):
     18     async with aiohttp.ClientSession() as session:
---> 19         data = await get_all(session, urls)
     20         return
     21 

<ipython-input-3-5ac99108678c> in get_all(session, urls)
     12         tasks.append(task)
     13         print(count,'-ID-',x,'|', end=' ')
---> 14     results = await asyncio.gather(*task)
     15     return results
     16 

~AppDataLocalProgramsPythonPython38libasynciofutures.py in __await__(self)
    260             yield self  # This tells Task to wait for completion.
    261         if not self.done():
--> 262             raise RuntimeError("await wasn't used with future")
    263         return self.result()  # May raise too.
    264 

RuntimeError: await wasn't used with future

is this error because of putting await inside the for loop function or it is because of the server problem? or maybe the way I wrote the script is wrong. Appreciate if any of you able to point me or guide me to the right direction

Asked By: Yazid Yaakub

||

Answers:

You can use multiprocessing to scrape multiple link simultaneously(parallelly):

from multiprocessing import Pool
    
def scrape(url):
    #Scraper script

p = Pool(10)
# This “10” means that 10 URLs will be processed at the same time.
p.map(scrape, list_of_all_urls)
p.terminate()
p.join()

Here we map function scrape with list_of_all_urls and Pool p will take care of executing each of them concurrently.This is similar to looping over list_of_all_urls in simple.py but here it is done concurrently. If number of URLs is 100 and we specify Pool(20), then it will take 5 iterations (100/20) and 20 URLs will be processed in one go.

Two things to note

  1. The links are not executed in order. You can see order is 2,1,3… This is because of multiprocessing and time is saved by one process by not waiting for previous one to finish. This is called parallel execution.
  2. This scrape very fast then normal. This difference grows very quickly when number of URLs increase which means that performance of multiprocessing script improves with large number of URLs.

You may visit here for more/detail information.

I believe this is same from previous question, I think you can use multiprocessing. I know this is not the right answer but you can use multiproces which is easy, and straightforward.

Answered By: Xitiz

await asyncio.gather(*task)

Should be:

await asyncio.gather(*tasks)

The exception actually comes from the *task. Not sure what this syntax is meant for, but it’s certainly not what you intended:

>>> t = asyncio.Task(asyncio.sleep(10))
>>> (*t,)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: await wasn't used with future
Answered By: Sam Bull
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.