Python Request URL response to slow, how make more quickle?

Question:

I have this code in python:

session = requests.Session()

for i in range(0, len(df_1)):
    page = session.head(df_1['listing_url'].loc[i], allow_redirects=False, stream=True)
    
    if page.status_code == 200:
        df_1['condition'][i] = 'active'    
    else:
        df_1['condition'][i] = 'false'

df_1 is my data frame and the column "listing_url" have more than 500 lines.

I want to Request if the URL list is active and append this in my data frame. But this code demands a long time. How can I reduce my time?

Asked By: Victor Guindani

||

Answers:

The problem with your current approach is that requests runs sequentially (synchronously), which means that a new request can’t be sent before the prior one is finished.

What you are looking for is handling those requests asynchronously. Sadly, requests library does not support asynchronous requests. A newer library that has similar API to requests but can do that is httpx. aiohttp is another popular choice. With httpx you can do something like this:

import asyncio
import httpx

listing_urls = list(df_1['listing_url'])

async def do_tasks():
    async with httpx.AsyncClient() as client:
        tasks = [client.head(url) for url in listing_urls]
        responses = await asyncio.gather(*tasks)
        return {r.url: r.status_code for r in responses}

url_2_status = asyncio.run(do_tasks())

This will give you a mapping of {url: status_code}. You should be able to go from there.

This solution assumes you are using Python3.7 or newer. Also remember to install httpx.

Answered By: Pawel Kam
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.