Scraping an page on the internet for numbers

Question:

I have the following code. It opens up a mass lottery page trying to get the winning numbers. It doesn’t work. The path looks good though. Please help.

from bs4 import BeautifulSoup
import requests


url = 'https://www.masslottery.com/games/draw-and-instants/mass-cash?date=2023-02-22'

# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text

# Parse the html content
soup = BeautifulSoup(html_content, "lxml")

# print(soup.prettify()) # print the parsed data of html

# Scrape the numbers
numbers = soup.find_all('span', attrs={'class': "winning-number-ball-circle"})

# Convert the numbers to int type
numbers = [int(number.text) for number in numbers]

# Print the numbers
print(numbers)
Asked By: Daniel

||

Answers:

Upon inspecting the webpage, we will discover that the HTML source code (stored in html_content) did not contain the relevant information for the game results (print html_content to check it out). It is because the webpage was obtaining the results via an API, accessible at:

https://www.masslottery.com/api/v1/draw-results

enter image description here

Instead, let’s try to GET the result needed from there. For historical results, do make a GET request with the appropriate parameters by replacing the YYYY-MM-DD with the desired date in ISO format:

https://www.masslottery.com/api/v1/draw-results/mass_cash?draw_date=2023-02-21

Bonus: Feel free to check out the site’s network activity (DevTools) at: Inspect > Network (for Chrome), or a similar procedure for other browsers, as it provides useful information about the API requests and responses.

Sample code to handle the API response as requested:

import requests
import json

# parameters
url = "https://www.masslottery.com/api/v1/draw-results/mass_cash"
params = {"draw_date": "2023-02-21"}

# parse the JSON response
response = requests.get(url, params=params)
data = response.json()

# access the first item in this list "winningNumbers" 
# and retrieve the "winningNumbers" key
winning_numbers = data["winningNumbers"][0]["winningNumbers"]
print(winning_numbers)
Answered By: JCTL