How to select a tags and scrape href value?

Question:

I am having trouble getting hyperlinks for tennis matches listed on a webpage, how do I go about fixing the code below so that it can obtain it through print?

import requests
from bs4 import BeautifulSoup
        
response = requests.get("https://www.betexplorer.com/results/tennis/?year=2022&month=11&day=02")
webpage = response.content

soup = BeautifulSoup(webpage, "html.parser")

print(soup.findAll('a href'))
Asked By: NewGuy1

||

Answers:

Change the last line to

print([a['href'] for a in soup.findAll('a')])

See a full tutorial here: https://pythonprogramminglanguage.com/get-links-from-webpage/

Answered By: damianr13

In newer code avoid old syntax findAll() instead use find_all() or select() with css selectors – For more take a minute to check docs

Select your elements more specific and use set comprehension to avoid duplicates:

set('https://www.betexplorer.com'+a.get('href') for a in soup.select('a[href^="/tennis"]:has(strong)'))

Example

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.betexplorer.com/results/tennis/?year=2022&month=11&day=02')
soup = BeautifulSoup(r.text)

set('https://www.betexplorer.com'+a.get('href') for a in soup.select('a[href^="/tennis"]:has(strong)'))

Output

{'https://www.betexplorer.com/tennis/itf-men-singles/m15-new-delhi-2/sinha-nitin-kumar-vardhan-vishnu/tOasQaJm/',
 'https://www.betexplorer.com/tennis/itf-women-doubles/w25-jerusalem/mushika-mao-mushika-mio-cohen-sapir-nagornaia-sofiia/xbNOHTEH/',
 'https://www.betexplorer.com/tennis/itf-men-singles/m25-jakarta-2/barki-nathan-anthony-sun-fajing/zy2r8bp0/',
 'https://www.betexplorer.com/tennis/itf-women-singles/w15-solarino/margherita-marcon-abbagnato-anastasia/lpq2YX4d/',
 'https://www.betexplorer.com/tennis/itf-women-singles/w60-sydney/lee-ya-hsuan-namigata-junri/CEQrNPIG/',
 'https://www.betexplorer.com/tennis/itf-men-doubles/m15-sharm-elsheikh-16/echeverria-john-marrero-curbelo-ivan-ianin-nikita-jasper-lai/nsGbyqiT/',...}
Answered By: HedgeHog

Solution found in the comments:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.betexplorer.com/results/tennis/?year=2022&month=11&day=02')
soup = BeautifulSoup(r.text, "html.parser")

print(set('https://www.betexplorer.com'+a.get('href') for a in soup.select('a[href^="/tennis"]:has(strong)')))

This answer was posted as an edit to the question How to select a tags and scrape href value? by the OP NewGuy1 under CC BY-SA 4.0.

Answered By: vvvvv