Extracting hrefs or specific tag from a page

Question:

I have been trying numerous ways but this website is proving very hard to scrape via bs4.

I am trying to extract the href value found in the snip below on one of the matches. the id is to extract all href tags from the page into a list. I am not returning any values the ideal result is a list containing all hrefs eg //www.premierleague.com/match/74911

enter image description here

import warnings
import numpy as np
from datetime import datetime
import requests
from bs4 import BeautifulSoup

warnings.filterwarnings('ignore')

# set up empty dataframe in a list for storage. errors is set up to handle any matches that dont scrape.
dataframe = []
errors = []

url = "https://www.premierleague.com/results"

response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

matches = {}

soup.find_all("div", {"class": "competitionContainer"})
Asked By: Paul Corcoran

||

Answers:

The data you see on the page is loaded from external source via JavaScript (you can open Web developer tools in your browser -> Network tab and start scrolling the page down. You should see the Ajax request there):

import json
import requests

api_url = "https://footballapi.pulselive.com/football/fixtures"

params = {
    "comps": "1",
    "compSeasons": "489",
    "teams": "127,1,2,130,131,4,6,7,34,9,26,10,11,12,23,15,20,21,25,38",
    "page": "1",
    "pageSize": "40",
    "sort": "desc",
    "statuses": "C",
    "altIds": "true",
}

headers = {
    'Origin': 'https://www.premierleague.com',
}

page = 0
while True:
    params['page'] = page
    data = requests.get(api_url, params=params, headers=headers).json()

    # uncoment this to print all data:
    # print(json.dumps(data, indent=4))

    for c in data['content']:
        team1, team2 = c['teams'][0]['team']['name'], c['teams'][1]['team']['name']
        print(f'{team1:<30} {team2:<30} https://www.premierleague.com/match/{int(c["id"])}')

    if page > data['pageInfo']['numPages']:
        break

    page += 1

Prints:


...

Chelsea                        Tottenham Hotspur              https://www.premierleague.com/match/74925
Nottingham Forest              West Ham United                https://www.premierleague.com/match/74928
Brentford                      Manchester United              https://www.premierleague.com/match/74923
Arsenal                        Leicester City                 https://www.premierleague.com/match/74921
Brighton & Hove Albion         Newcastle United               https://www.premierleague.com/match/74924
Manchester City                Bournemouth                    https://www.premierleague.com/match/74927
Southampton                    Leeds United                   https://www.premierleague.com/match/74929
Wolverhampton Wanderers        Fulham                         https://www.premierleague.com/match/74930
Aston Villa                    Everton                        https://www.premierleague.com/match/74922
West Ham United                Manchester City                https://www.premierleague.com/match/74920
Leicester City                 Brentford                      https://www.premierleague.com/match/74916
Manchester United              Brighton & Hove Albion         https://www.premierleague.com/match/74919
Everton                        Chelsea                        https://www.premierleague.com/match/74913
Bournemouth                    Aston Villa                    https://www.premierleague.com/match/74912
Leeds United                   Wolverhampton Wanderers        https://www.premierleague.com/match/74915
Newcastle United               Nottingham Forest              https://www.premierleague.com/match/74917
Tottenham Hotspur              Southampton                    https://www.premierleague.com/match/74918
Fulham                         Liverpool                      https://www.premierleague.com/match/74914
Crystal Palace                 Arsenal                        https://www.premierleague.com/match/74911
Answered By: Andrej Kesely