BeautifulSoup can't find HTML element by class

Question:

This is the website I’m trying to scrape with Python:

https://www.ebay.de/sch/i.html?_from=R40&_nkw=iphone+8&_sacat=0&LH_Sold=1&LH_Complete=1&rt=nc&LH_ItemCondition=3000

I want to access the ‘ul’ element with the class of ‘srp-results srp-list clearfix’. This is what I tried with requests and BeautifulSoup:

from bs4 import BeautifulSoup
import requests

url = 'https://www.ebay.de/sch/i.html?_from=R40&_nkw=iphone+8&_sacat=0&LH_Sold=1&LH_Complete=1&rt=nc&LH_ItemCondition=3000'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

uls = soup.find_all('ul', attrs = {'class': 'srp-results srp-list clearfix'})

And the output is always an empty string.
I also tried scraping the website with Selenium Webdriver and I got the same result.

Asked By: user13785870

||

Answers:

First I was a little bit confused about your error but after a bit of debugging I figured out that: eBay dynamically generates that ul with JavaScript

So since you can’t execute JavaScript with BeautifulSoup you have to use selenium and wait until the JavaScript loads that ul

Answered By: Hexception

It is probably because the content you are looking for is rendered by JavaScript After the page loads on a web browser this means that the web browser load that content after running javascript which you cannot get with requests.get request from python.

I would suggest to learn Selenium to Scrape the data you want

Answered By: ZainSci

An empty output can be either due to the wrong class or due to the fact that it is not specified in what form to display information.

Also one of the reasons may be that the request was blocked, if using requests as default user-agent in requests library is a python-requests. eBay doesn’t render the page with Javascript, at least for now.

An additional step could be to rotate user-agent, for example, to switch between PC, mobile, and tablet, as well as between browsers e.g. Chrome, Firefox, Safari, Edge and so on.

You can find the necessary elements on the page for parsing using the SelectorGadget Chrome extension to easily select selectors by clicking on the desired element in your browser, which does not always work perfectly if the page is actively using JS (in this case we can).

Code example with pagination for all possible pages in the online IDE.

from bs4 import BeautifulSoup
import requests, lxml
import json

# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
    }
    
params = {
    '_nkw': 'iphone+8',      # search query  
    'LH_Sold': '1',          # shows sold items
    '_pgn': 1                # page number
}

data = []

while True:
    page = requests.get('https://www.ebay.de/sch/i.html', params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(page.text, 'lxml')
    
    print(f"Extracting page: {params['_pgn']}")

    print("-" * 10)
    
    for products in soup.select(".s-item__pl-on-bottom"):
        title = products.select_one(".s-item__title span").text
        price = products.select_one(".s-item__price").text
        try:
            sold_date = products.select_one(".s-item__title--tagblock .POSITIVE").text
        except:
            sold_date = None
        
        data.append({
          "title" : title,
          "price" : price,
          "sold_date": sold_date
        })

    if soup.select_one(".pagination__next"):
        params['_pgn'] += 1
    else:
        break

print(json.dumps(data, indent=2, ensure_ascii=False))

Example output:

[
  {
    "title": "Apple iPhone 8 - 64 GB- Rose Gold und viele mehr(Ohne Simlock)",
    "price": "EUR 91,00",
    "sold_date": "Verkauft  22. Feb 2023"
  },
  {
    "title": "iPhone 8 64GB Produkt rot - Ersatzteile & Reparaturen",
    "price": "EUR 17,03",
    "sold_date": "Verkauft  22. Feb 2023"
  },
  {
    "title": "iPhone 8 64GB Spacegrau - Ersatzteile & Reparaturen",
    "price": "EUR 17,03",
    "sold_date": "Verkauft  22. Feb 2023"
  },
  other results ...
]

As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.

Example code with pagination:

from serpapi import EbaySearch
import json

params = {
    "api_key": "...",                 # serpapi key, https://serpapi.com/manage-api-key   
    "engine": "ebay",                 # search engine
    "ebay_domain": "ebay.de",         # ebay domain
    "_nkw": "iphone+8",               # search query
    "_pgn": 1                         # pagination
  # "LH_Sold": "1"                    # shows sold items
}

search = EbaySearch(params)        # where data extraction happens

page_num = 0

data = []

while True:
    results = search.get_dict()     # JSON -> Python dict

    if "error" in results:
        print(results["error"])
        break
    
    for organic_result in results.get("organic_results", []):
        title = organic_result.get("title")
        price = organic_result.get("price")

        data.append({
          "price" : price,
          "title" : title
        })
                    
    page_num += 1
    print(page_num)
    
    if "next" in results.get("pagination", {}):
        params['_pgn'] += 1

    else:
        break

    print(json.dumps(data, indent=2, ensure_ascii=False))

Output:

[
  {
    "price": {
      "raw": "EUR 123,75",
      "extracted": 123.75
    },
    "title": "Apple iPhone 8 64GB 128GB 256GB Unlocked Space Grey Gold Silver Red 4G | Good"
  },
  {
    "price": {
      "raw": "EUR 137,85",
      "extracted": 137.85
    },
    "title": "Apple iPhone 8 - 64GB - All Colors - Unlocked - Good Condition"
  },
  other results ...
]

There’s a 13 ways to scrape any public data from any website blog post if you want to know more about website scraping.

Answered By: Denis Skopa