Why I'm unable to call this async function with await?

Question:

This is the web scraping problem that I have encountered that I don’t know how to fix.

I want to call the async function scrape_session, but I cannot call it in the main file, and it gives me the error:
error: "await" allowed only within async function

import os
from bs4 import BeautifulSoup
from playwright.async_api import async_playwright, TimeoutError as PlaywrightTimeout
import time 

SEASONS = list(range(2016,2023))
DATA_DIR = 'data'
STANDINGS_DIR = os.path.join(DATA_DIR, 'standings')
SCORES_DIR = os.path.join(DATA_DIR, 'scores')

async def get_html(url,selector,sleep=5, retries = 3):
    html = None 
    for i in range(1, retries+1):
        time.sleep(sleep * i)

        try: 
            async with async_playwright() as p: 
                browser = await p.firefox.launch()
                page = await browser.new_page()
                await page.goto(url)
                print(await page.title())
                html = await page.inner_html(selector)
    
        except PlaywrightTimeout:
            print(f'Timeout error on{url}')
            continue

        else: 
            break
    return html

async def scrape_season(season):
    url = f'https://www.basketball-reference.com/leagues/NBA_{season}_games.html'
    html = await get_html(url, '#content .filter')

    soup = BeautifulSoup(html)
    links = soup.find_all('a')
    href = [l['href'] for l in links]
    standings_pages = [f"https://basketball-reference.com{l}" for l in href]

    for url in standings_pages:
        save_path = os.path.join(STANDINGS_DIR, url.split("/")[-1])
        if os.path.exists(save_path):
            continue

    html = await get_html(url, '#all_schedule')
    with open(save_path, 'w+') as f:
        f.write(html)

for season in SEASONS:
    await(scrape_season(season))
Asked By: wade watts

||

Answers:

The problem with this code is that it tries to await in top-level code. This is not allowed. You need to call await inside an async function only.

Async/await is effectively just a library you could have also written yourself. It does not do any magic with the interpreter.

However, to answer your question, replacing the for loop at the end of your code with this should do the trick. Please read up on how asyncio works to understand why your code did not work and this (I hope, I did not test it,) does: https://docs.python.org/3/library/asyncio.html

import asyncio

async def main():
    seasons = [scrape_season(season) for season in SEASONS]
    await asyncio.gather(seasons)

asyncio.run(main())
Answered By: Egeau