How To Use FindAll While Web Scraping


I want to scrape and get the tiles (Microsoft Xbox 360 E 250 GB Black Console, Microsoft Xbox One S 1TB Console White with 2 Wireless Controllers etc). In due course I want to feed the Python script different eBay URLS but for the sake of this question, I just want to focus on one specific eBay URL.

I then want to add them titles to a data frame which I would write to Excel. I think I can do this part myself.

Did not work –

for post in soup.findAll('a',id='ListViewInner'):
    print (post.get('href'))

Did not work –

for post in soup.findAll('a',id='body'):
      print (post.get('href'))

Did not work –

for post in soup.findAll('a',id='body'):
   print (post.get('href'))

h1 = soup.find("a",{"class":"lvtitle"})

Did not work –

for post in soup.findAll('a',attrs={"class":"left-center"}):
    print (post.get('href'))

Did not work –

for post in soup.findAll('a',{'id':'ListViewInner'}):
    print (post.get('href'))

This gave me links for the wrong parts of the web page, I know href is hyperlinks and not titles but I figured if the below code had worked, I could amend it for titles –

for post in soup.findAll('a'):
    print (post.get('href'))

Here is all my code –

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import urllib.request
from bs4 import BeautifulSoup

#BaseURL, Syntax1 and Syntax2 should be standard across all
#Ebay URLs, whereas Request and PageNumber can change 

BaseURL = ""

Syntax1 = "&_skc=50&rt=nc"

Request = "xbox"

Syntax2  = "&_pgn="

PageNumber ="2"

URL = BaseURL + Request + Syntax2 + PageNumber + Syntax1

print (URL)
HTML = urllib.request.urlopen(URL).read()



#print (soup)

for post in soup.findAll('a'):
    print (post.get('href'))
Asked By: Ross Symonds



Use css selector which is much faster.

import requests
from bs4 import  BeautifulSoup

url = ''
Res = requests.get(url)
soup = BeautifulSoup(Res.text,'html.parser')
for post in"#ListViewInner a"):

Use format() function instead of concatenation string.

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import urllib.request
from bs4 import BeautifulSoup

BaseURL = "{}&_pgn={}&_skc={}&rt={}"

skc = "50"
rt = "nc"
Request = "xbox"
PageNumber = "2"

URL = BaseURL.format(Request,PageNumber,skc,rt)
HTML = urllib.request.urlopen(URL).read()
soup = BeautifulSoup(HTML,"html.parser")
for post in'#ListViewInner a'):
Answered By: KunduK

I see you set the second page for search in the parameters, but you can also extract data from all pages using non-token based pagination.

Using CSS selectors to find the required elements on the page can help you quickly find those elements without using the browser dev tools.

SelectorGadget Chrome Extension will help you with this, does not always work perfectly if the page is heavily using JS ( in this case we can).

Also, if you need to extract data from other eBay domains, it will be enough to replace only the domain with the one you need, the rest of the code will remain unchanged.

Check code in the online IDE.

from bs4 import BeautifulSoup
import requests, json, lxml

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36",
params = {
    '_nkw': 'xbox',         # search query 
    '_pgn': 1               # page number

data = []
limit = 5                   # page limit (if needed)
while True:
    page = requests.get('', params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(page.text, 'lxml')
    print(f"Extracting page: {params['_pgn']}")

    print("-" * 10)
    for products in".s-item__info"):
        link = products.select_one(".s-item__link")["href"]
          "link": link 

    if params['_pgn'] == limit:
    if soup.select_one(".pagination__next"):
        params['_pgn'] += 1

print(json.dumps(data, indent=2, ensure_ascii=False))

Example output:

    "link": ""
    "link": ""
    "link": "!29405!US!-1&amdata=enc%3AAQAHAAAA4D4Ig10eel0xwkapJj05fqHi76GUNC0DZPJXHh7MahTM2nf6K9f26IQ0tlXAW3zwb6JBqA%2Fy3pbU%2Bx%2BidkkQzhXQWUeBY3ybe1DE%2F3jDwFcnh%2FL6bmbtT265oHpegLadvV92ZfGyfexeyqQRCzLxXO5PgOCyXvWt470Q7RdGJ2iVsStKQK9e85x%2FJzpe2nyNZQZvo%2BvaVREej%2F4LN9UmO7bhDJpF%2Bm%2BL%2BtkTuao4YkVLFR%2F6Lqqv2kPVdwLg880w9mct5r%2BmPxclXYBaDexsGLTCNY6qdOf6RJo5zaPombCD%7Ctkp%3ABFBMusfWqdlh"
  other results ...

As an alternative, you can use Ebay Organic Results API from SerpApi. It’s a paid API with a free plan that handles blocks and parsing on their backend.

Example code with pagination:

from serpapi import EbaySearch
import json

params = {
    "api_key": "...",                 # serpapi key,   
    "engine": "ebay",                 # search engine
    "ebay_domain": "",      # ebay domain
    "_nkw": "xbox",                   # search query
    "LH_Sold": "1",                   # shows sold items
    "_pgn": 1                         # page number

search = EbaySearch(params)           # where data extraction happens

limit = 5
page_num = 0
data = []

while True:
    results = search.get_dict()       # JSON -> Python dict

    if "error" in results:
    for organic_result in results.get("organic_results", []):
        data.append({"Link": organic_result.get("link")})
    page_num += 1

    if params['_pgn'] == limit:
    if "next" in results.get("pagination", {}):
        params['_pgn'] += 1
print(json.dumps(data, indent=2, ensure_ascii=False))

Output: same as bs4 solution

There’s a 13 ways to scrape any public data from any website blog post if you want to know more about website scraping.

Answered By: Denis Skopa