BS4: Doesn't detect all tags with find_all

Question:

Im’ trying to webscraping this url: https://baloncestoenvivo.feb.es/partido/2218269

And I try to get all the div’s with this class = "box-datos-partido". When I try to get all of them with:

soup.find_all("div", class_="box-datos-partido")

I’ve got only one of the two div’s there are in the web page. I’ve got an array with only one element. The content of this element is:

<div class="box-datos-partido">
    <div class="fecha">
        <span class="label">Fecha</span>
        <span class="txt">31/10/2021 - 12:00</span>
    </div>
    <div class="arbitros">
        <span class="label">Árbitros</span>
        <span class="txt referee">DIAZ DE SARRALDE MARTIN, IÑIGO</span>
        <span class="txt referee">SANCHEZ NUÑEZ, UNAI</span>
        <span class="txt referee"></span>
    </div>
    <div class="pista">
        <span class="label">Pista</span>
        <span class="txt pabellon">POLIDEPORTIVO URRETA</span>
        <span class="txt direccion">Galdakao (Vizcaya)</span>
    </div>
</div>

When we should be receive an array with two elements. The content of this two elements should be:

<div class="box-datos-partido">
    <div class="fecha">
        <span class="label">Fecha</span>
        <span class="txt">31-10-2021 - 12:00</span>
    </div>
    <div class="arbitros">
        <span class="label">Árbitros</span>
        <span class="txt referee">DIAZ DE SARRALDE MARTIN, IÑIGO</span><span class="txt referee">SANCHEZ NUÑEZ, UNAI</span><span class="txt referee"></span>
    </div>
    <div class="pista">
        <span class="label">Pista</span>
        <span class="txt pabellon">POLIDEPORTIVO URRETA</span><span class="txt direccion">BIZKAIA KALEA, S/N, Vizcaya (Galdakao)</span>
    </div>
</div>

<div class="box-datos-partido">
    <div class="fecha">
        <span class="label">Fecha</span>
        <span class="txt">31/10/2021 - 12:00</span>
    </div>
    <div class="arbitros">
        <span class="label">Árbitros</span>
        <span class="txt referee">DIAZ DE SARRALDE MARTIN, IÑIGO</span>
        <span class="txt referee">SANCHEZ NUÑEZ, UNAI</span>
        <span class="txt referee"></span>
    </div>
    <div class="pista">
        <span class="label">Pista</span>
        <span class="txt pabellon">POLIDEPORTIVO URRETA</span>
        <span class="txt direccion">Galdakao (Vizcaya)</span>
    </div>
</div>

How is that possible? What am I doing wrong to receive only one element of the two?

Asked By: José Carlos

||

Answers:

The data you see is loaded via JavaScript from external URL. To load it, you can use requests module (this example will load the players into 2 pandas dataframes):

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup


headers = {
    "Authorization": "Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImQzOWE5MzlhZTQyZmFlMTM5NWJjODNmYjcwZjc1ZDc3IiwidHlwIjoiSldUIn0.eyJuYmYiOjE2NTkyNjM1MDUsImV4cCI6MTY1OTM0OTkwNSwiaXNzIjoiaHR0cHM6Ly9pbnRyYWZlYi5mZWIuZXMvaWRlbnRpdHkuYXBpIiwiYXVkIjpbImh0dHBzOi8vaW50cmFmZWIuZmViLmVzL2lkZW50aXR5LmFwaS9yZXNvdXJjZXMiLCJsaXZlc3RhdHMuYXBpIl0sImNsaWVudF9pZCI6ImJhbG9uY2VzdG9lbnZpdm9hcHAiLCJpZGFtYml0byI6IjEiLCJyb2xlIjpbIk92ZXJWaWV3IiwiVGVhbVN0YXRzIiwiU2hvdENoYXJ0IiwiUmFua2luZyIsIktleUZhY3RzIiwiQm94U2NvcmUiXSwic2NvcGUiOlsibGl2ZXN0YXRzLmFwaSJdfQ.YDVnzLhZAw8kzE2LLjiS8VZayY-sfUgqMN4zdnjROLImHRamOJ_Htz4ehK26QcpywfZmrD5iUWnFnRFJrJyZdhudOp09B0tmn4HnWs4JHcQBirUpdLi4oDqONctn1J31OktVhHYpAS36Fs-2KTjwHcgR4G-EQsA6vxjkLKYjw6we0oY5w1Q_GUqRmEvfDQY3b2a-VlFEcxMQBS6XFfEL4naSz84w9aW2e7UCnic_Mm4CHzN1RzitcBSiunQyINshQzg-1G4STARAZZjfaVZCP8SDB4bWeuaXYxkwX40vbisJD8mXFP1xN93THlIg-d0LNfZg8iqD0Lx8xRf9nRdXug"
}
url = "https://intrafeb.feb.es/LiveStats.API/api/v1/BoxScore/2218269"
data = requests.get(url, headers=headers).json()

# uncomment to print all data:
# print(json.dumps(data, indent=4))

t1 = data["BOXSCORE"]["TEAM"][0]["PLAYER"]
t2 = data["BOXSCORE"]["TEAM"][1]["PLAYER"]

df1 = pd.DataFrame(t1)
df2 = pd.DataFrame(t2)

print(df1)
print(df2)

Prints:

  p1m p1a    p1p p2m p2a   p2p p3m p3a   p3p fgm fga   fgp   min minFormatted   sta bs tc mt ro rd rt rf to st   ind pllss   val assist reb pf pts inn       id  no                   name                                                           logo
0   4   6   66,7   0   5   0,0   0   6   0,0   0  11   0,0  1812        30:12  None  0  0  0  0  3  3  5  6  1  None    -1  None      1   3  1   4   1  2188507   0    J. ROYALE SACRISTAN  https://competiciones.feb.es/estadisticas/Foto.aspx?c=2188507
1   0   0    0,0   0   5   0,0   0   0   0,0   0   5   0,0  1021        17:01  None  0  0  0  1  5  6  0  2  1  None   -20  None      0   6  0   0   0  2188508   2    O. ARENAS DE LA HOZ  https://competiciones.feb.es/estadisticas/Foto.aspx?c=2188508
2   0   0    0,0   1   2  50,0   0   1   0,0   1   3  33,3  1363        22:43  None  0  0  0  0  2  2  1  2  1  None    -4  None      1   2  0   2   0  2277838   4    A. RAMASCO CERECERO  https://competiciones.feb.es/estadisticas/Foto.aspx?c=2277838

...
Answered By: Andrej Kesely

Actually, two divs with the same class = "box-datos-partido" that’s right but if you make disabled JavaScript then you will notice that the same selection is selecting only one of them(first one) because rest of them are loaded dynamically by JavaScript. If you want to pull them then you can take help with an automation tool something like selenium. Here I use selenium with bs4 to grab the right divs with html content.

Example:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url='https://baloncestoenvivo.feb.es/partido/2218269'
driver.get(url)
driver.maximize_window()
time.sleep(5)

soup=BeautifulSoup(driver.page_source,'lxml')
for card in soup.select('div.box-datos-partido'):
    print(card.prettify())

Output:

<div class="box-datos-partido">
 <div class="fecha">
  <span class="label">
   Fecha
  </span>
  <span class="txt">
   31-10-2021 - 12:00
  </span>
 </div>
 <div class="arbitros">
  <span class="label">
   Árbitros
  </span>
  <span class="txt referee">
   DIAZ DE SARRALDE MARTIN, IÑIGO
  </span>
  <span class="txt referee">
   SANCHEZ NUÑEZ, UNAI
  </span>
  <span class="txt referee">
  </span>
 </div>
 <div class="pista">
  <span class="label">
   Pista
  </span>
  <span class="txt pabellon">
   POLIDEPORTIVO URRETA
  </span>
  <span class="txt direccion">
   BIZKAIA KALEA, S/N, Vizcaya (Galdakao)
  </span>
 </div>
</div>

<div class="box-datos-partido">
 <div class="fecha">
  <span class="label">
   Fecha
  </span>
  <span class="txt">
   31/10/2021 - 12:00
  </span>
 </div>
 <div class="arbitros">
  <span class="label">
   Árbitros
  </span>
  <span class="txt referee">
   DIAZ DE SARRALDE MARTIN, IÑIGO
  </span>
  <span class="txt referee">
   SANCHEZ NUÑEZ, UNAI
  </span>
  <span class="txt referee">
  </span>
 </div>
 <div class="pista">
  <span class="label">
   Pista
  </span>
  <span class="txt pabellon">
   POLIDEPORTIVO URRETA
  </span>
  <span class="txt direccion">
   Galdakao (Vizcaya)
  </span>
 </div>
</div>
Answered By: F.Hoque
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.