Pagination with BeautifulSoup in python

Question

I am doing a web scraping project for this site.
https://yellowpages.com.eg/en/search/fast-food
I managed to scrape the data but I am struggling with the pagination
As I want to make a loop that scrapes the next page button and then uses the scraped URL from the next button to do the same process.

url = 'https://yellowpages.com.eg/en/search/fast-food'
while True:
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'lxml')
    pages = soup.find_all('ul', class_='pagination center-pagination')
    for page in pages:
        nextpage = page.find('li', class_='waves-effect').find('a', {'aria-label': 'Next'})
        if nextpage:
            uu = nextpage.get('href')
            url = 'http://www.yellowpages.com.eg' + str(uu)
            print(url)
        else:
            break

This code returns the next URL in the pagination order and then breaks out of loop.

Asked By: Zuhair Hamza

||

Source

Answer 1

The problem is that

nextpage =page.find('li', class_='waves-effect').find('a', {'aria-label' : 'Next'})

does return the Next button, but only as long as the Previous button is not there, meaning that it breaks as soon as you leave the first page (it returns None).

Instead, page.find_all('li', class_='waves-effect') returns the Next and the Previous button.

To (maybe) robustly get the Next button, change your line to

nextpage =page.find_all('li', class_='waves-effect')[-1].find('a', {'aria-label' : 'Next'})

Answered By: mcsoini

Pagination with BeautifulSoup in python

Question:

Answers: