How to web scrape title after specific class?
Question:
I’m trying to web scrape some information on the following website: https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1
I would like to scrape all the movies title, years and the length of the movie.
Here is my code to try to scrape the information
import requests
from bs4 import BeautifulSoup
page = requests.get('https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1&language=yue')
soup = BeautifulSoup(page.content, 'html.parser')
all_title = soup.find_all(class_="ng-star-inserted")
print(all_title)
However, I can’t scrape anything into the list.
My expected output:
Title Year Length
10 Things I Hate About You 1999 97
17 Again 2009 102
1778 Stories of Me and My Wife 2011 140
Answers:
You should use puppeteer to scrap that website. It’s javascript based so you’ll need to wait until the website is fully loaded.
https://github.com/pyppeteer/pyppeteer
import asyncio
from pyppeteer import launch
from bs4 import BeautifulSoup
async def get_data():
browser = await launch(headless=False)
page = await browser.newPage()
await page.goto("https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1", waitUntil="networkidle0")
html = await page.content()
soup = BeautifulSoup(html, 'html.parser')
# Now handle the data
await browser.close()
asyncio.get_event_loop().run_until_complete(get_data())
I’m trying to web scrape some information on the following website: https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1
I would like to scrape all the movies title, years and the length of the movie.
Here is my code to try to scrape the information
import requests
from bs4 import BeautifulSoup
page = requests.get('https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1&language=yue')
soup = BeautifulSoup(page.content, 'html.parser')
all_title = soup.find_all(class_="ng-star-inserted")
print(all_title)
However, I can’t scrape anything into the list.
My expected output:
Title Year Length
10 Things I Hate About You 1999 97
17 Again 2009 102
1778 Stories of Me and My Wife 2011 140
You should use puppeteer to scrap that website. It’s javascript based so you’ll need to wait until the website is fully loaded.
https://github.com/pyppeteer/pyppeteer
import asyncio
from pyppeteer import launch
from bs4 import BeautifulSoup
async def get_data():
browser = await launch(headless=False)
page = await browser.newPage()
await page.goto("https://entertainment.cathaypacific.com/catalog?template=movie&parent=%E9%9B%BB%E5%BD%B1", waitUntil="networkidle0")
html = await page.content()
soup = BeautifulSoup(html, 'html.parser')
# Now handle the data
await browser.close()
asyncio.get_event_loop().run_until_complete(get_data())