Can't locate the correct beautifulsoup class and id combo

Question

I have the following code

from bs4 import BeautifulSoup
import requests

URL = 'https://www.youtube.com/gaming/games'

response = requests.get(URL).text
soup = BeautifulSoup(response, 'html.parser')

elem = soup.find_all('a', class_ = 'yt-simple-endpoint focus-on-expand style-scope ytd-game-details-renderer')

print(elem)

I am trying to isolate all the individual games on https://www.youtube.com/gaming/games.

I would like to just get the game name and how many people are watching. My issue is that I just can’t find the right " ", class_ = '' " combo.

I’ve tried the following:
soup.find_all:

('a', class_ = 'yt-simple-endpoint focus-on-expand style-scope ytd-game-details-renderer')
('game', class_ = 'style-scope ytd-game-card-renderer')
(class_ = 'style-scope ytd-grid-renderer')
(id = 'items')

And many different variations.

If I just use find_all('div') I get random data. I really think (id = 'items') is my solution, but aside from 'div' I get the same response every time, a pair of brackets []. I’ve also tried searching the individual div class objects I get in the results, but so far I’m getting the same [] results or random data that I don’t need.

If I use find instead of find_all (elem = soup.find(id='items')) I get "None" as a response.

I’m looking at the subscriber count, with an id of 'live-viewers-count', and it still prints [].

What I’m looking at:

Asked By: user21090678

||

Source

Answer 1

You can’t really do this because this page is loaded dynamically with javascript.

BeautifulSoup doesn’t run javascript.

See, when right-clicking in the page and selecting show page source, there is mostly just compiled javascript.

To scrape youtube, I’d either use Selenium to run a headless web-browser, or Js2Py if you need performance.

… or simply use youtube APIs : https://developers.google.com/youtube/v3/docs ^_^’

Answered By: Loïc

Answer 2

Update
Here’s how to traverse the game data JSON elements.

First, narrow down to game_data, which is a list of JSON elements.

game_data = (
    json.loads(main[20:-1])
    ['contents']
    ['twoColumnBrowseResultsRenderer']
    ['tabs'][0]
    ['tabRenderer']
    ['content']
    ['sectionListRenderer']
    ['contents'][0]
    ['itemSectionRenderer']
    ['contents'][0]
    ['shelfRenderer']
    ['content']
    ['gridRenderer']
    ['items']
)

Now iterate over the list. For each element, there’s a section of the data packet we’ll call details, which contains game name and views.

Then use the paths I showed in my original answer to capture name and view count for each game.

for game in game_data:
    details = (
        game
        ['gameCardRenderer']
        ['game']
        ['gameDetailsRenderer']
    )
    game_name = details['title']['simpleText']
    
    view_ct = details['liveViewersText']['runs'][0]['text']
    
    print(f"Game: {game_name} / Views: {view_ct}")

Output

Game: Valorant / Views: 100K
Game: Grand Theft Auto V / Views: 61K
Game: Dota 2 / Views: 57K
Game: Minecraft / Views: 50K
# ...

Original answer

All of the data you need is stored as JSON in one of the <script> tags, it’s just a pain to follow down the nested object to the fields you need. You can see it’s all there if you just look at soup.body.

I had a few spare minutes just now, this should get you started – shows you how to get to the Game and Live Viewers count for the first game listed currently (‘Valorant’)

import json

# buried as JSON in a <script> inside <body>
main = soup.body.find_all('script')[13].contents[0]

This is how you get to game name (you can iterate instead of indexing [0] to get all the games):

# Game name
print('Game:', json.loads(main[20:-1])
 ['contents']
 ['twoColumnBrowseResultsRenderer']
 ['tabs'][0]
 ['tabRenderer']
 ['content']
 ['sectionListRenderer']
 ['contents'][0]
 ['itemSectionRenderer']
 ['contents'][0]
 ['shelfRenderer']
 ['content']
 ['gridRenderer']
 ['items'][0]
 ['gameCardRenderer']
 ['game']
 ['gameDetailsRenderer']
 ['title']
 ['simpleText']
)

Output

Game: Valorant

And this is Viewer Count:

print('Live Viewers:', json.loads(main[20:-1])
 ['contents']
 ['twoColumnBrowseResultsRenderer']
 ['tabs'][0]
 ['tabRenderer']
 ['content']
 ['sectionListRenderer']
 ['contents'][0]
 ['itemSectionRenderer']
 ['contents'][0]
 ['shelfRenderer']
 ['content']
 ['gridRenderer']
 ['items'][0]
 ['gameCardRenderer']
 ['game']
 ['gameDetailsRenderer']
 ['liveViewersText']
 ['runs'][0]
 ['text'])

Output

Live Viewers: 100K

Answered By: andrew_reece

Can't locate the correct beautifulsoup class and id combo

Question:

Answers: