Why can I only scrape first 4 pages of results on eBay?

Question:

I have a simple script to analyze sold data on eBay (baseball trading cards). It seems to be working fine for the first 4 pages but on the 5th page it simply does not load in the desired html content anymore, and I am not able to figure out why this happens:

#Import statements
import requests
import time
from bs4 import BeautifulSoup as soup
from tqdm import tqdm
#FOR DEBUG
Page_1="https://www.ebay.com/sch/213/i.html?_from=R40&LH_Sold=1&_sop=16&_pgn=1"

#Request URL working example
source=requests.get(Page_1)
time.sleep(5)
eBay_full = soup(source.text, "lxml")
Complete_container=eBay_full.find("ul",{"class":"b-list__items_nofooter"})
Single_item=Complete_container.find_all("div",{"class":"s-item__wrapper clearfix"})
items=[]
#For all items on page perform desired operation
for i in tqdm(Single_item):
    items.append(i.find("a", {"class": "s-item__link"})["href"].split('?')[0].split('/')[-1])
    #Works fine for Links_to_check[0] upto Links_to_check[3]

However, when I try to scrape the fifth page or further pages the following occurs:

Page_5="https://www.ebay.com/sch/213/i.html?_from=R40&LH_Sold=1&_sop=16&_pgn=5"

source=requests.get(Page_5)
time.sleep(5)
eBay_full = soup(source.text, "lxml")
Complete_container=eBay_full.find("ul",{"class":"b-list__items_nofooter"})
Single_item=Complete_container.find_all("div",{"class":"s-item__wrapper clearfix"})
items=[]
#For all items on page perform desired operation
for i in tqdm(Single_item):
    items.append(i.find("a", {"class": "s-item__link"})["href"].split('?')[0].split('/')[-1])

----> 5 Single_item=Complete_container.find_all("div",{"class":"s-item__wrapper clearfix"})
      6 items=[]
      7 #For all items on page perform desired operation

AttributeError: 'NoneType' object has no attribute 'find_all'

This seems a logical consequence of the ul class b-list__items_nofooter missing in the eBay_full soup for the later pages. The question however is why is this information missing? Scrolling through the soup, all items of interest seem to be absent. On the webpage itself this information is, as expected, present. Who can guide me?

Asked By: Rivered

||

Answers:

As per @Sebastien D his remark the problem has been solved

In the headers variable put only one of these browsers, along with the current stable version number (e.g. Chrome/53.0.2785.143, latest found here)

headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}

source= requests.get(Page_5, headers=headers, timeout=2)

Answered By: Rivered

As Sebastien D suggested, the main problem lies in that eBay understands that the bot/script send a request.

But how does eBay understand it? It’s because default requests user-agent is python-requests and eBay understands it and seems to block the requests made with such user-agent.

By adding a custom user-agent we can somewhat fake real user request. However, it’s not completely reliable, and headers might need to be rotated or/and used with proxies, ideally residential.

List of user-agents at whatismybrowser.

As a side note, you can use the SelectorGadget Chrome extension to easily select CSS selectors by clicking on the desired element in your browser, which does not always work perfectly if the page is heavily using JS ( in this case we can).

The example below shows how to extract listings from all pages. Code in online IDE.

from bs4 import BeautifulSoup
import requests, json, lxml

# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
    }
    
params = {
    '_nkw': 'baseball trading cards', # search query
    'LH_Sold': '1',                   # shows sold items
    '_pgn': 1                         # page number
    }

data = []

while True:
    page = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(page.text, 'lxml')
    
    print(f"Extracting page: {params['_pgn']}")

    print("-" * 10)
    
    for products in soup.select(".s-item__info"):
        title = products.select_one(".s-item__title span").text
        price = products.select_one(".s-item__price").text
        link = products.select_one(".s-item__link")["href"]
        
        data.append({
          "title" : title,
          "price" : price,
          "link" : link
        })

    if soup.select_one(".pagination__next"):
        params['_pgn'] += 1
    else:
        break

    print(json.dumps(data, indent=2, ensure_ascii=False))

Example output

Extracting page: 1
----------
[
  {
    "title": "Shop on eBay",
    "price": "$20.00",
    "link": "https://ebay.com/itm/123456?hash=item28caef0a3a:g:E3kAAOSwlGJiMikD&amdata=enc%3AAQAHAAAAsJoWXGf0hxNZspTmhb8%2FTJCCurAWCHuXJ2Xi3S9cwXL6BX04zSEiVaDMCvsUbApftgXEAHGJU1ZGugZO%2FnW1U7Gb6vgoL%2BmXlqCbLkwoZfF3AUAK8YvJ5B4%2BnhFA7ID4dxpYs4jjExEnN5SR2g1mQe7QtLkmGt%2FZ%2FbH2W62cXPuKbf550ExbnBPO2QJyZTXYCuw5KVkMdFMDuoB4p3FwJKcSPzez5kyQyVjyiIq6PB2q%7Ctkp%3ABlBMULq7kqyXYA"
  },
  {
    "title": "Ken Griffey Jr. Seattle Mariners 1989 Topps Traded RC Rookie Card #41T",
    "price": "$7.20",
    "link": "https://www.ebay.com/itm/385118055958?hash=item59aad32e16:g:EwgAAOSwhgljI0Vm&amdata=enc%3AAQAHAAAAoFRRlvb50yb%2FN4cmlg5OtVDKIH0DsaMJBL3Tp67SI1dCSP1WPdZW3f16bTf4HTSUhX0g3OMmZSitEY3F3SVGg0%2FhSBF3ykE9X88Lo2EHuS2b23tA1kGiG92F9xyr73RLorcidserdH8tvUXhxmT4pJDnCfMAdfqtRzSIxcB6h4aDC1J1XvJ5IyRfYtWBGUQ60ykrA7mNlhH53cwZe5MiRSw%3D%7Ctkp%3ABk9SR7rKxt7sYA"
  },
  {
    "title": "Ken Griffey Jr. 1989 Score Traded Rookie Card Gem 10 Auto Beckett 13604418",
    "price": "$349.00",
    "link": "https://www.ebay.com/itm/353982131344?hash=item526afaac90:g:9hQAAOSwvCpiQ5FY&amdata=enc%3AAQAHAAAAoOKm1SWvHtdNVIEqtE4m5%2B453xtvR75ZimUBLL16P0WwfJy%2BJJQ2Phd9crgAacTWlsqp9HB%2Ft0McttOjmCfyL0RDQB%2FYOWQK3hxj%2FoDRmybJRipjqb0JG2%2BCa1RhI04PN3R5wpH9vvYqefwY6JuAsPqDU0SmSk6h1h%2FQr7cfJqOmdCo0cqbwPcJ8OcvAyP07txigrDyO55XqFD7CHcSmUPA%3D%7Ctkp%3ABk9SR7rKxt7sYA"
  },
  {
    "title": "Mike Jorgensen NY Mets MLB OF-1B 1972 Topps Baseball Card #16 Single Original",
    "price": "$1.19",
    "link": "https://www.ebay.com/itm/374255790865?hash=item5723622b11:g:KiwAAOSwz4ljI0G4&amdata=enc%3AAQAHAAAAoPVkKyeDZ7wbRNBwQppCcjVmLlOlY3ylPVwQyG7dfOy1UtPYhK7tRXtvn5v3M5n%2F35MS1LXLvWAioKRrMGPEPCmDoMkhdynuH3csaincrM%2F6JNwwIUFa3F%2FcylfPqnrxjJXF7cZ3ga9aCihTM6sfVJc1kzNkaBw2C2ewMyQ3ARgYpuDcUa6CMo4zBKF%2FGTj5KlZieLYywQm4dnzLCrFbtEM%3D%7Ctkp%3ABk9SR7rKxt7sYA"
  },
  # ...
]
Answered By: Denis Skopa