Python Request URL response to slow, how make more quickle?
Question:
I have this code in python:
session = requests.Session()
for i in range(0, len(df_1)):
page = session.head(df_1['listing_url'].loc[i], allow_redirects=False, stream=True)
if page.status_code == 200:
df_1['condition'][i] = 'active'
else:
df_1['condition'][i] = 'false'
df_1 is my data frame and the column "listing_url" have more than 500 lines.
I want to Request if the URL list is active and append this in my data frame. But this code demands a long time. How can I reduce my time?
Answers:
The problem with your current approach is that requests
runs sequentially (synchronously), which means that a new request can’t be sent before the prior one is finished.
What you are looking for is handling those requests asynchronously. Sadly, requests
library does not support asynchronous requests. A newer library that has similar API to requests
but can do that is httpx
. aiohttp
is another popular choice. With httpx
you can do something like this:
import asyncio
import httpx
listing_urls = list(df_1['listing_url'])
async def do_tasks():
async with httpx.AsyncClient() as client:
tasks = [client.head(url) for url in listing_urls]
responses = await asyncio.gather(*tasks)
return {r.url: r.status_code for r in responses}
url_2_status = asyncio.run(do_tasks())
This will give you a mapping of {url: status_code}
. You should be able to go from there.
This solution assumes you are using Python3.7 or newer. Also remember to install httpx
.
I have this code in python:
session = requests.Session()
for i in range(0, len(df_1)):
page = session.head(df_1['listing_url'].loc[i], allow_redirects=False, stream=True)
if page.status_code == 200:
df_1['condition'][i] = 'active'
else:
df_1['condition'][i] = 'false'
df_1 is my data frame and the column "listing_url" have more than 500 lines.
I want to Request if the URL list is active and append this in my data frame. But this code demands a long time. How can I reduce my time?
The problem with your current approach is that requests
runs sequentially (synchronously), which means that a new request can’t be sent before the prior one is finished.
What you are looking for is handling those requests asynchronously. Sadly, requests
library does not support asynchronous requests. A newer library that has similar API to requests
but can do that is httpx
. aiohttp
is another popular choice. With httpx
you can do something like this:
import asyncio
import httpx
listing_urls = list(df_1['listing_url'])
async def do_tasks():
async with httpx.AsyncClient() as client:
tasks = [client.head(url) for url in listing_urls]
responses = await asyncio.gather(*tasks)
return {r.url: r.status_code for r in responses}
url_2_status = asyncio.run(do_tasks())
This will give you a mapping of {url: status_code}
. You should be able to go from there.
This solution assumes you are using Python3.7 or newer. Also remember to install httpx
.