Python web-scraper applied through Visual Studio question
Question:
I have this web scraper program in python, but it prints both tennis players Felix and Alexander. I would like to only print the first available tennis player as a separate item and exclude all the ones after it, so what do I need change in the code to do this?
To note, I did this through Visual Studio 2022 and applied the program to use Microsoft Edge web browser.
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.betexplorer.com/tennis/atp-singles/basel/auger-aliassime-felix-bublik-alexander/U5HIueTc/")
webpage = response.content
soup = BeautifulSoup(webpage, "html.parser")
for h2 in soup.find_all('h2'):
values = [data for data in h2.find_all('a')]
for value in values:
print(value.text.replace(" ","_"))
print()
Answers:
Instead of the loop, just do
print(soup.h2.text.strip())
Instead of looping through each tag individually you can use the select() function to find that specific tag and print the first one.
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.betexplorer.com/tennis/atp-singles/basel/auger-aliassime-felix-bublik-alexander/U5HIueTc/")
webpage = response.content
soup = BeautifulSoup(webpage, "html.parser")
print(soup.select('h2 a')[0].text.replace(' ','_'))
I have this web scraper program in python, but it prints both tennis players Felix and Alexander. I would like to only print the first available tennis player as a separate item and exclude all the ones after it, so what do I need change in the code to do this?
To note, I did this through Visual Studio 2022 and applied the program to use Microsoft Edge web browser.
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.betexplorer.com/tennis/atp-singles/basel/auger-aliassime-felix-bublik-alexander/U5HIueTc/")
webpage = response.content
soup = BeautifulSoup(webpage, "html.parser")
for h2 in soup.find_all('h2'):
values = [data for data in h2.find_all('a')]
for value in values:
print(value.text.replace(" ","_"))
print()
Instead of the loop, just do
print(soup.h2.text.strip())
Instead of looping through each tag individually you can use the select() function to find that specific tag and print the first one.
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.betexplorer.com/tennis/atp-singles/basel/auger-aliassime-felix-bublik-alexander/U5HIueTc/")
webpage = response.content
soup = BeautifulSoup(webpage, "html.parser")
print(soup.select('h2 a')[0].text.replace(' ','_'))