Why is beautifulsoup not returning data elements?

Question

I’ve tried many things to return the data on this page: https://www.hebban.nl/rank . For some reason it’s not returning any data point, after many tries.

Can someone point me in the right direction and tell me what I’m doing wrong. I’m learning but I seem to be stuck – even with chatGTP 🙂

the below is a quick example to get (in theory the author and title on the page).

from bs4 import BeautifulSoup

# Send a GET request to the URL
url = "https://www.hebban.nl/rank"
response = requests.get(url)

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the book titles, authors, and image url links
books = soup.find_all('div', class_='row-fluid')
for book in books:
    title = book.find('a', class_='neutral').text.strip()
    author = book.find('span', class_='author').text.strip()


    print(title + ' by ' + author)
    print('Image URL: ' + img_url)

Asked By: jsb92

||

Source

Answer 1

Always and first of all, take a look at your soup to see if all the expected ingredients are in place. – Simply print your response / soup

You have to set a user-agent to your request to avoid a first block by the server and get beautifulsoup find something you are looking for:

response = requests.get(url,headers={'user-agent':'some agent'})

Also take a closer look to your selections and note that you have to select the #1 book separatly, because it will not fit the same selection.

Example

from bs4 import BeautifulSoup

# Send a GET request to the URL
url = "https://www.hebban.nl/rank"
response = requests.get(url,headers={'user-agent':'some agent'})

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the book titles, authors, and image url links
books = soup.find_all('div', class_='item')
for book in books:
    title = book.h3.text.strip()
    author = book.find('span', class_='author').text.strip()
    img_url = book.img.get('data-src')
    
    print(title + ' by ' + author)
    print('Image URL: ' + img_url)

Output

Output exceeds the size limit. Open the full output data in a text editor
2 by Raoul de Jong
Image URL: https://static.hebban.nl/covers/00001122/thumb/DEF%20omslag%20-%20Boto%20Banja.png
3 by Freida McFadden
Image URL: https://static.hebban.nl/covers/00001063/thumb/9789032520267.jpg
4 by Helen Fields
Image URL: https://static.hebban.nl/covers/00001062/thumb/9789026360787.jpg
5 by Thomas Olde Heuvelt
Image URL: https://static.hebban.nl/covers/00001032/thumb/9789022591116.jpeg
...

Answered By: HedgeHog

Why is beautifulsoup not returning data elements?

Question:

Answers:

Example

Output