How do I apply table class criteria in a web-scraper through python?

Question

Although the web-scraper below works, it also includes listed hyperlinks unrelated to the webpage tables. What I would like to have help with is limiting the class criteria to only relevant tennis match hyperlinks within the class table "table-main only12 js-nrbanner-t".

import requests
from bs4 import BeautifulSoup
import pandas as pd

r = requests.get('https://www.betexplorer.com/results/tennis/?year=2022&month=11&day=02')
soup = BeautifulSoup(r.text, "html.parser")

matchlist = set('https://www.betexplorer.com'+a.get('href') for a in soup.select('a[href^="/tennis"]:has(strong)'))

print(pd.DataFrame(matchlist))

Edit: Driftr95 has found the exact solution I was looking for, even when I didn’t phrase the question correctly

Asked By: NewGuy1

||

Source

Answer 1

You can just add the table to the selector in select

tLinkSel = 'table.table-main.only12.js-nrbanner-t a[href^="/tennis"]:has(strong)'
matchlist = set('https://www.betexplorer.com'+a.get('href') for a in soup.select(tLinkSel))

although, I have to mention that I did not see any difference in the results when searching in dev tools, but this will limit the links to only those in the table.

Additional EDIT:

You can target specific dates with the data-dt attribute of the rows [tr]; for example, for Nov 2, 2022, you can set

tLinkSel = 'tr[data-dt^="2,11,2022,"] a[href^="/tennis"]:has(strong)'

Answered By: Driftr95

How do I apply table class criteria in a web-scraper through python?

Question:

Answers: