Python, Trio async function upon needs

Question:

Within trio/anyio, is it possible to pause the tasks until i do specific operation and then continue all of it.

Let’s say that i run specific function to obtain a valid cookie and then i start to crawl a website, But after sometimes this cookie got expired and i would need to run the previous function again to obtain a new cookie.

so if spawn 10 tasks under the nursery and during that the cookie got expired while 6 tasks is running! so how i can pause all of them and run this function only one time ?

import trio
import httpx


async def get_cookies(client):
    # Let's say that here i will use a headless browser operation to obtain a valid cookie.
    pass


limiter = trio.CapacityLimiter(20)


async def crawler(client, url, sender):
    async with limiter, sender:
        r = await client.get(url)
        if "something special happen" in r.text:
            pass
            # here i want to say if my cookie got expired,
            # Then i want to run get_cookies() only one time .
        await sender.send(r.text)


async def main():
    async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
        await get_cookies(client)
        sender, receiver = trio.open_memory_channel(0)
        nurse.start_soon(rec, receiver)
        urls = []
        async with sender:
            for url in urls:
                nurse.start_soon(crawler, client, sender.clone())


async def rec(receiver):
    async with receiver:
        for i in receiver:
            print(i)

if __name__ == "__main__":
    trio.run(main)

Answers:

You simply wrap get_cookies in an async with some_lock block. In that block, if you already have a cookie (let’s say it’s a global variable) you return it, otherwise you acquire one and then set the global.

When you notice that the cookie has expired, you delete it (i.e. set the global back to None) and call get_cookies.

In other words, something along these lines:

class CrawlData: 
    def __init__(self, client):
        self.client = client
        self.valid = False
        self.lock = trio.Lock()
        self.limiter = trio.CapacityLimiter(20)
        
    async def get_cookie(self):       
        if self.valid:                                                                             
            return        
                      
        async with self.lock:        
            if self.valid:        
                return     
                
            ... # fetch cookie here, using self.client
            
            self.valid = True
                                   
    async def get(self, url):
        r = await self.client.get(url)
        if check_for_expired_cookie(r):
            await self.get_cookie()       
            r = await self.client.get(url)
            if check_for_expired_cookie(r):
                raise RuntimeError("New cookie doesn't work", r)
        return r
        

async def crawler(data, url, sender):
    async with data.limiter, sender: 
        r = await data.get(url)      
        await sender.send(r.text)
        
        
async def main():
    async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
        data = CrawlData(client)
        sender, receiver = trio.open_memory_channel(0)
        nurse.start_soon(rec, receiver)
        urls = []
        async with sender:
            for url in urls:
                nurse.start_soon(crawler, client, sender.clone())
        ...
Answered By: Matthias Urlichs
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.