How to web scrape title after specific class?

Question

I’m trying to web scrape some information on the following website: https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1
I would like to scrape all the movies title, years and the length of the movie.

Here is my code to try to scrape the information

import requests
from bs4 import BeautifulSoup
page = requests.get('https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1&language=yue')

soup = BeautifulSoup(page.content, 'html.parser')
all_title = soup.find_all(class_="ng-star-inserted")
print(all_title)

However, I can’t scrape anything into the list.

My expected output:

Title                             Year       Length
10 Things I Hate About You        1999       97
17 Again                          2009       102
1778 Stories of Me and My Wife    2011       140

Asked By: Hi Hi try

||

Source

Answer 1

You should use puppeteer to scrap that website. It’s javascript based so you’ll need to wait until the website is fully loaded.

https://github.com/pyppeteer/pyppeteer


import asyncio
from pyppeteer import launch
from bs4 import BeautifulSoup

async def get_data():
    browser = await launch(headless=False)
    page = await browser.newPage()
    await page.goto("https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1", waitUntil="networkidle0")

    html = await page.content()
    soup = BeautifulSoup(html, 'html.parser')

    # Now handle the data

    await browser.close()

asyncio.get_event_loop().run_until_complete(get_data())

Answered By: Florian Lepage

How to web scrape title after specific class?

Question:

Answers: