Load multiple pages by web scraping python

Question:

I wrote a python code for web scraping so that I can import the data from flipkart.
I need to load multiple pages so that I can import many products but right now only 1 product page is coming.

from urllib.request import urlopen as uReq
from requests import get
from bs4 import BeautifulSoup as soup
import tablib 


my_url = 'https://www.xxxxxx.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=1'

uClient2 = uReq(my_url)
page_html = uClient2.read()
uClient2.close()

page_soup = soup(page_html, "html.parser")

containers11 = page_soup.findAll("div",{"class":"_3O0U0u"}) 

filename = "FoodProcessor.csv"
f = open(filename, "w", encoding='utf-8-sig')
headers = "Product, Price, Description n"
f.write(headers)

for container in containers11:
    title_container = container.findAll("div",{"class":"_3wU53n"})
    product_name = title_container[0].text

    price_con = container.findAll("div",{"class":"_1vC4OE _2rQ-NK"})
    price = price_con[0].text



    description_container = container.findAll("ul",{"class":"vFw0gD"})
    product_description = description_container[0].text


    print("Product: " + product_name)
    print("Price: " + price)
    print("Description" + product_description)
    f.write(product_name + "," + price.replace(",","") +"," + product_description +"n")

f.close()
Asked By: Dheeraj Gupta

||

Answers:

You can Firstly get the number of pages available and iterate over for each of the pages and parse the data respectively.

Like if you change the URL with respect to page

  • ‘https://www.flipkart.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=1’ which points to page 1
  • ‘https://www.flipkart.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=2’ which points to page 2
Answered By: Mandar Autade

You have to check if the next page button exist or not. If yes then return True, go to that next page and start scraping if no then return False and move to the next container. Check for the class name of that button first.

# to check if a pagination exists on the page:
    
    def go_next_page():
        try:
            button = driver.find_element_by_xpath('//a[@class="<class name>"]')
            return True, button
        except NoSuchElementException:
            return False, None
Answered By: rmb
try:
        next_btn = driver.find_element_by_xpath("//a//span[text()='Next']")
        next_btn.click()
    except ElementClickInterceptedException as ec: 
        classes = "_3ighFh"
        overlay = driver.find_element_by_xpath("(//div[@class='{}'])[last()]".format(classes))
        driver.execute_script("arguments[0].style.visibility = 'hidden'",overlay)
        next_btn = driver.find_element_by_xpath("//a//span[text()='Next']")
        next_btn.click()
    
    except Exception as e:
        print(str(e.msg()))
        break
except TimeoutException:
    print("Page Timed Out")

driver.quit()

Answered By: Dheeraj Gupta

For me, the easiest way is to add an extra loop with the "page" variable:

# just check the number of the last page on the website
page = 1

while page != 10:
     print(f'Scraping page: {page}')
     my_url = 'https://www.xxxxxx.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page={page}'

     # here add the for loop you already have


     page += 1

This method should work.

Answered By: Farid Mammadaliyev
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.