Scrape web to csv file

Question:

I’m trying to scrape the different prices for an item and i would like to scrape all the available items to get the average price ,i’ve tried the below code but it only output the first value in the list and open the csv but with no data just the header

     #Open URL
link3= "https://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p2334524.m570.l2632&_nkw=naruto+shippuden+ultimate+ninja+storm+4+ps4&_sacat=139973&LH_TitleDesc=0&rt=nc&_odkw=Naruto+Shippuden%3A+Ultimate+Ninja+Storm+4&_osacat=0&LH_BIN=1&LH_PrefLoc=1"
req = Request(link3, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
# # #Get to the sections
#Create excel file with headers
with open('yellowPage.csv', 'w', encoding='utf8', newline='') as f:
    thewriter = writer(f)
    header = ['link','prices','avg']
    thewriter.writerow(header)
    #Loop between section to scrap data
with requests.Session() as c:

    soup = BeautifulSoup(webpage, 'html5lib')
    lists = soup.find_all('li',class_='s-item s-item__pl-on-bottom')
    prices = []
    for list in lists:
         prices.append(float(list.find('span', class_="s-item__price").text.replace('£','').replace(',','').replace('$','')))
    avg =sum(prices)/len(prices)
    print(avg)
    print(prices)
    print(len(prices))
    info=[link3,prices,avg]
    thewriter.writerow(info)
    

I need help in identifying the best way to get all the items’ price from all the available pages as well as send scrapped data to csv file

Asked By: Dalia Tawfeek

||

Answers:

You might be running into errors because you are overwriting python’s "list" keyword in your for-loop. try changing "for list in lists" to "for item in lists" and update your loop contents accordingly.

Additionally you are defining page at the beginning of your example code, but never updating the page number and making a new request. You will need to restructure your script to update the page number in the URL within the loop.

Answered By: Jesse

This should do what you want. I found the last page number, i.e. 9, and then scraped each page until the last page was scraped.

There is, however, an issue with gathering all of the products; there are 9 pages and each page displays 60 products (by default), but I was only able to get 265 prices. The discrepancy is likely caused by the product li tags having different class attributes. For example some, of the class attributes had only had the s-item s-item__pl-on-bottom and not s-item--watch-at-corner.

import requests
from bs4 import BeautifulSoup

# getting html of first page to find total number of succeeding pages
page = requests.get(f'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn=1').text
soup = BeautifulSoup(page, 'html.parser')

# find last page number
end_page = soup.find('a', href='https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn=9&rt=nc').text

prices = []
page_num = 0

# gets html of each page until last page is reach
while page_num < int(end_page):
    page_num += 1
    page = requests.get(f'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=mario&_sacat=0&LH_TitleDesc=0&_pgn={page_num}').text
    soup = BeautifulSoup(page, 'html.parser')

    # list of all ;i tags in a page
    lists = soup.find_all('li', class_="s-item s-item__pl-on-bottom s-item--watch-at-corner")

    # iterate over each page's li tags and append product price to a list
    for list in lists:
        prices.append(float(list.find('span', class_="s-item__price").text.replace('£','').replace(',','')))

# Average price of the scraped product prices
print(sum(prices)/len(prices))
Answered By: Übermensch