Asynchronous requests inside the for loop in python

Question

I have this snippet

config = {10: 'https://www.youtube.com/', 5: 'https://www.youtube.com/', 7: 'https://www.youtube.com/',
      3: 'https://sportal.com/', 11: 'https://sportal.com/'}

def test(arg):

    for key in arg.keys():
        requests.get(arg[key], timeout=key)


test(config)

On that way the things are happaning synchronously. I want to do it аsynchronously. I want to iterate through the loop without waiting for response for each address and to go ahead to the next one. And so until I iterate though all addresses in dictionary. Than I want to wait until I get all responses for all addresses and after that to get out of test function. I know that I can do it with threading but I read that with asyncio lyb it can be done better, but I couldn’t implement it. If anyone have even better suggestions I am open for them. Here is my try:

async def test(arg):

loop = asyncio.get_event_loop()
tasks = [loop.run_in_executor(requests.get(arg[key], timeout=key) for key in arg.keys())]
await asyncio.gather(*tasks)

asyncio.run(test(config))

Asked By: htodev

||

Source

Answer 1

Here is the solution:

def addresses(adr, to):
    requests.get(adr, timeout=to)

async def test(arg):

    loop = asyncio.get_event_loop()
    tasks = [loop.run_in_executor(None, addresses, arg[key], key) for key    in arg.keys()]
    await asyncio.gather(*tasks)



asyncio.run(test(config))

Now it works аsynchronously with lyb asyncio not with threading.

Answered By: htodev

Answer 2

Some good answers here. I had trouble with this myself (I do a lot of webscraping) and so I created a package to help me async-scrape (https://pypi.org/project/async-scrape/).

It supports GET and POST. I tried to make it as easy to use as possible. You just need to specify a handler function for the response when you instantiate and then use the scrape_all method to do the work.

It uses the term scrape becasue i’ve build in some handlers for common errors when scraping websites.

You can do some things in it as well like limit the call rate if you find you’re getting blocked.

An example of it’s use is:

# Create an instance
from async_scrape import AsyncScrape

def post_process(html, resp, **kwargs):
    """Function to process the gathered response from the request"""
    if resp.status == 200:
        return "Request worked"
    else:
        return "Request failed"

async_Scrape = AsyncScrape(
    post_process_func=post_process,
    post_process_kwargs={},
    fetch_error_handler=None,
    use_proxy=False,
    proxy=None,
    pac_url=None,
    acceptable_error_limit=100,
    attempt_limit=5,
    rest_between_attempts=True,
    rest_wait=60,
    call_rate_limit=None,
    randomise_headers=True
)

urls = [
    "https://www.google.com",
    "https://www.bing.com",
]

resps = async_Scrape.scrape_all(urls)

To do this inside a loop i collect the results and add then to a set and pop off the old ones.

EG

from async_scrape import AsyncScrape
from bs4 import BeautifulSoup as bs

def post_process(html, resp, **kwargs):
    """Function to process the gathered response from the request"""
    new_urls = bs.findall("a", {"class":"new_link_on_website"}
    return [new_urls, resp]

async_Scrape = AsyncScrape(
    post_process_func=post_process,
    post_process_kwargs={}
)

# Run the loop
urls = set(["https://initial_webpage.com/"])
processed = set()
all_resps = []
while len(urls):
    resps = async_scrape.scrape_all(urls)
    # Get failed urls
    success_reqs = set([
        r["req"] for r in resps 
        if not r["error"]
    ])
    errored_reqs = set([
        r["req"] for r in resps 
        if r["error"]
    ])
    # Get what you want from the responses
    for r in success_reqs:
        # Add found urls to urls
        urls |= set(r["func_resp"][0]) # "func_resp" is the key to the return from your handler function
        # Collect the response
        all_resps.extend(r["func_resp"][1])
        # Add to processed urls
        processed.add(r["url"]) # "url" is the key to the url from the response
    # Remove processed urls
    urls = urls - processed

Answered By: Robert Franklin

Asynchronous requests inside the for loop in python

Question:

Answers: