Playwright does not load all of the HTML Python

Question:

I’m just trying to scrape the titles from the page, but the html that is being loaded with page.inner_html(‘body’) does not include all of the html. I think it may be loaded from JS, but when I look into the network tab in dev tools I cannot seem to find a json or where it’s being loaded from. I have tried this with Selenium as well, so there must be something I’m doing fundamentally wrong.

So no items appear from the list, but the regular HTML shows up fine. No amount of waiting for the content to load, will load the information.

#import playwright
from playwright.sync_api import sync_playwright

url = 'https://order.mandarake.co.jp/order/listPage/list?categoryCode=07&keyword=naruto&lang=en'

#open url
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    #enable javascript
    
    page.goto(url)
    #enable javascript
    

    #load the page and wait for the page to load
    page.wait_for_load_state("networkidle")

    #get the html content
    html = page.inner_html("body")

    print(html)

    #close browser
    browser.close()
Asked By: Sweetcheeks12354

||

Answers:

No, the webpage isn’t loaded content dynamically by JavaScript rather it’s entirely static HTML DOM

from bs4 import BeautifulSoup
import requests

page = requests.get('https://order.mandarake.co.jp/order/listPage/list?categoryCode=07&keyword=naruto&lang=en')
soup = BeautifulSoup(page.content,'lxml')

data = []
for e in soup.select('div.title'):

    d = {
        'title':e.a.get_text(strip=True),
        
        }
    
    data.append(d)

print(data)

Output:

[{'title': 'NARUTO THE ANIMATION CHRONICLEu3000genga made for sale'}, {'title': 'Plex DPCF Haruno Sakura Reboru ring of the eyes'}, {'title': 'Naruto: Shippudenu3000(replica)  ナルト'}, {'title': 'Naruto: Shippudenu3000(replica)  ナルト'}, {'title': 'Naruto: Shippudenu3000(replica)  NARUTO -ナルト-'}, {'title': 'Naruto: Shippuden ナルトu3000(replica)'}, {'title': 'Naruto Shippuudenu3000(replica) NARUTO -ナルト-'}, {'title': 'NARUTO -ナルト- 疾風伝u3000(複製セル)'}, {'title': 'MegaHouse    ちみ メガ Petit Chara Land NARUTO SHIPPUDEN ナルト blast-of-wind intermediary   Even [swirl ナルト special is a volume on ばよ.  
  All 6 types set] inner bag not opened/box damaged'}, {'title': 'NARUTO -ナルト- 疾風伝u3000(複製セル)'}, {'title': 'NARUTO -ナルト- 疾風伝u3000(複製セル)'}, {'title': 'NARUTO -ナルト- 疾風伝'}, {'title': 'NARUTO -ナルト- 疾風伝'}, {'title': 'NARUTO -ナルト-'}]
Answered By: F.Hoque