Have you ever get RuntimeError: await wasn't used with future?
Question:
trying to extract data from a website by using asyncio and aiohttp, and AWAIT problem occur in for loop function.
here my script :
async def get_page(session,x):
async with session.get(f'https://disclosure.bursamalaysia.com/FileAccess/viewHtml?e={x}') as r:
return await r.text()
async def get_all(session, urls):
tasks =[]
sem = asyncio.Semaphore(1)
count = 0
for x in urls:
count +=1
task = asyncio.create_task(get_page(session,x))
tasks.append(task)
print(count,'-ID-',x,'|', end=' ')
results = await asyncio.gather(*task)
return results
async def main(urls):
async with aiohttp.ClientSession() as session:
data = await get_all(session, urls)
return
if __name__ == '__main__':
urls = titlelink
results = asyncio.run(main(urls))
print(results)
for the error, this is what it return when the scraper break :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-5ac99108678c> in <module>
22 if __name__ == '__main__':
23 urls = titlelink
---> 24 results = asyncio.run(main(urls))
25 print(results)
~AppDataLocalProgramsPythonPython38libsite-packagesnest_asyncio.py in run(future, debug)
30 loop = asyncio.get_event_loop()
31 loop.set_debug(debug)
---> 32 return loop.run_until_complete(future)
33
34 if sys.version_info >= (3, 6, 0):
~AppDataLocalProgramsPythonPython38libsite-packagesnest_asyncio.py in run_until_complete(self, future)
68 raise RuntimeError(
69 'Event loop stopped before Future completed.')
---> 70 return f.result()
71
72 def _run_once(self):
~AppDataLocalProgramsPythonPython38libasynciofutures.py in result(self)
176 self.__log_traceback = False
177 if self._exception is not None:
--> 178 raise self._exception
179 return self._result
180
~AppDataLocalProgramsPythonPython38libasynciotasks.py in __step(***failed resolving arguments***)
278 # We use the `send` method directly, because coroutines
279 # don't have `__iter__` and `__next__` methods.
--> 280 result = coro.send(None)
281 else:
282 result = coro.throw(exc)
<ipython-input-3-5ac99108678c> in main(urls)
17 async def main(urls):
18 async with aiohttp.ClientSession() as session:
---> 19 data = await get_all(session, urls)
20 return
21
<ipython-input-3-5ac99108678c> in get_all(session, urls)
12 tasks.append(task)
13 print(count,'-ID-',x,'|', end=' ')
---> 14 results = await asyncio.gather(*task)
15 return results
16
~AppDataLocalProgramsPythonPython38libasynciofutures.py in __await__(self)
260 yield self # This tells Task to wait for completion.
261 if not self.done():
--> 262 raise RuntimeError("await wasn't used with future")
263 return self.result() # May raise too.
264
RuntimeError: await wasn't used with future
is this error because of putting await inside the for loop function or it is because of the server problem? or maybe the way I wrote the script is wrong. Appreciate if any of you able to point me or guide me to the right direction
Answers:
You can use multiprocessing
to scrape multiple link simultaneously(parallelly):
from multiprocessing import Pool
def scrape(url):
#Scraper script
p = Pool(10)
# This “10” means that 10 URLs will be processed at the same time.
p.map(scrape, list_of_all_urls)
p.terminate()
p.join()
Here we map function scrape with list_of_all_urls and Pool p will take care of executing each of them concurrently.This is similar to looping over list_of_all_urls in simple.py but here it is done concurrently. If number of URLs is 100 and we specify Pool(20), then it will take 5 iterations (100/20) and 20 URLs will be processed in one go.
Two things to note
- The links are not executed in order. You can see order is 2,1,3… This is because of multiprocessing and time is saved by one process by not waiting for previous one to finish. This is called parallel execution.
- This scrape very fast then normal. This difference grows very quickly when number of URLs increase which means that performance of multiprocessing script improves with large number of URLs.
You may visit here for more/detail information.
I believe this is same from previous question, I think you can use multiprocessing
. I know this is not the right answer but you can use multiproces which is easy, and straightforward.
await asyncio.gather(*task)
Should be:
await asyncio.gather(*tasks)
The exception actually comes from the *task
. Not sure what this syntax is meant for, but it’s certainly not what you intended:
>>> t = asyncio.Task(asyncio.sleep(10))
>>> (*t,)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: await wasn't used with future
trying to extract data from a website by using asyncio and aiohttp, and AWAIT problem occur in for loop function.
here my script :
async def get_page(session,x):
async with session.get(f'https://disclosure.bursamalaysia.com/FileAccess/viewHtml?e={x}') as r:
return await r.text()
async def get_all(session, urls):
tasks =[]
sem = asyncio.Semaphore(1)
count = 0
for x in urls:
count +=1
task = asyncio.create_task(get_page(session,x))
tasks.append(task)
print(count,'-ID-',x,'|', end=' ')
results = await asyncio.gather(*task)
return results
async def main(urls):
async with aiohttp.ClientSession() as session:
data = await get_all(session, urls)
return
if __name__ == '__main__':
urls = titlelink
results = asyncio.run(main(urls))
print(results)
for the error, this is what it return when the scraper break :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-5ac99108678c> in <module>
22 if __name__ == '__main__':
23 urls = titlelink
---> 24 results = asyncio.run(main(urls))
25 print(results)
~AppDataLocalProgramsPythonPython38libsite-packagesnest_asyncio.py in run(future, debug)
30 loop = asyncio.get_event_loop()
31 loop.set_debug(debug)
---> 32 return loop.run_until_complete(future)
33
34 if sys.version_info >= (3, 6, 0):
~AppDataLocalProgramsPythonPython38libsite-packagesnest_asyncio.py in run_until_complete(self, future)
68 raise RuntimeError(
69 'Event loop stopped before Future completed.')
---> 70 return f.result()
71
72 def _run_once(self):
~AppDataLocalProgramsPythonPython38libasynciofutures.py in result(self)
176 self.__log_traceback = False
177 if self._exception is not None:
--> 178 raise self._exception
179 return self._result
180
~AppDataLocalProgramsPythonPython38libasynciotasks.py in __step(***failed resolving arguments***)
278 # We use the `send` method directly, because coroutines
279 # don't have `__iter__` and `__next__` methods.
--> 280 result = coro.send(None)
281 else:
282 result = coro.throw(exc)
<ipython-input-3-5ac99108678c> in main(urls)
17 async def main(urls):
18 async with aiohttp.ClientSession() as session:
---> 19 data = await get_all(session, urls)
20 return
21
<ipython-input-3-5ac99108678c> in get_all(session, urls)
12 tasks.append(task)
13 print(count,'-ID-',x,'|', end=' ')
---> 14 results = await asyncio.gather(*task)
15 return results
16
~AppDataLocalProgramsPythonPython38libasynciofutures.py in __await__(self)
260 yield self # This tells Task to wait for completion.
261 if not self.done():
--> 262 raise RuntimeError("await wasn't used with future")
263 return self.result() # May raise too.
264
RuntimeError: await wasn't used with future
is this error because of putting await inside the for loop function or it is because of the server problem? or maybe the way I wrote the script is wrong. Appreciate if any of you able to point me or guide me to the right direction
You can use multiprocessing
to scrape multiple link simultaneously(parallelly):
from multiprocessing import Pool
def scrape(url):
#Scraper script
p = Pool(10)
# This “10” means that 10 URLs will be processed at the same time.
p.map(scrape, list_of_all_urls)
p.terminate()
p.join()
Here we map function scrape with list_of_all_urls and Pool p will take care of executing each of them concurrently.This is similar to looping over list_of_all_urls in simple.py but here it is done concurrently. If number of URLs is 100 and we specify Pool(20), then it will take 5 iterations (100/20) and 20 URLs will be processed in one go.
Two things to note
- The links are not executed in order. You can see order is 2,1,3… This is because of multiprocessing and time is saved by one process by not waiting for previous one to finish. This is called parallel execution.
- This scrape very fast then normal. This difference grows very quickly when number of URLs increase which means that performance of multiprocessing script improves with large number of URLs.
You may visit here for more/detail information.
I believe this is same from previous question, I think you can use multiprocessing
. I know this is not the right answer but you can use multiproces which is easy, and straightforward.
await asyncio.gather(*task)
Should be:
await asyncio.gather(*tasks)
The exception actually comes from the *task
. Not sure what this syntax is meant for, but it’s certainly not what you intended:
>>> t = asyncio.Task(asyncio.sleep(10))
>>> (*t,)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: await wasn't used with future