Getting tags by class returns an empty list

Question:

I am using requests and Beautiful Soup to scrape some data from https://covid19.who.int/. Near the top of the website, there is a box containing numbers such as "new cases in last 24 hours", which is what I want to use. Upon inspecting the website, I found that it is stored in a div container with the class "sc-AxjAm sc-qQxXP hTCctY". However, when I try to get this element, it returns an empty list. Here is my code:

import requests
from bs4 import BeautifulSoup

r = requests.get(url='https://covid19.who.int')
soup = BeautifulSoup(r.text, 'lxml')
data = soup.find_all('div', class_='sc-AxjAm sc-qQxXP hTCctY')
print(data)

This code prints an empty list. Can someone help?

Asked By: BlazingLeaf12

||

Answers:

The information is built up in the browser via data retrieved in JSON requests. So it is all available, just not in the HTML returned.

Try the following:

import requests

req = requests.get('https://covid19.who.int/page-data/index/page-data.json')
data = req.json()
cases = data['result']['pageContext']['rawDataSets']['byDay']['rows'][-1]

print(f"New Cases in last 24hrs: {cases[6]:,}")
print(f"Cumulative cases: {cases[7]:,}")
print(f"Cumulative deaths: {cases[2]:,}")

This should give you:

New Cases in last 24hrs: 3,321,782
Cumulative cases: 364,191,494
Cumulative deaths: 5,631,457

The amount of information returned in the JSON is HUGE, so trying to find what you want will be a challenge. I would recommend you write the contents of req.text to a text file and inspect that.

Answered By: Martin Evans

You could also continue getting information using your original method. However, Martin’s method still remains cleaner.

import requests
from bs4 import BeautifulSoup

r = requests.get(url='https://covid19.who.int')
soup = BeautifulSoup(r.text, 'lxml')
data = soup.find_all('div', class_=['sc-AxjAm', 'sc-qQxXP', 'hTCctY'])
print(data)

Essentially, when you take a class element from the html it will split based on space and the eventual structure will be a list. In the BeautifulSoup documentation you will see that this is due to class be a multi valued attribute (https://www.crummy.com/software/BeautifulSoup/bs4/doc/#multi-valued-attributes).

Knowing this could be useful for writing lambda functions inside find_all.

soup.find_all(tag.name=='div' and tag.get("class") == ['sc-AxjAm','sc-qQxXP','hTCctY'])
Answered By: WilliamYWu