Playwright auto-scroll to bottom of infinite-scroll page

Question:

I am trying to automate the scraping of a site with "infinite scroll" with Python and Playwright.

The issue is that Playwright doesn’t include, as of yet, a scroll functionnality let alone an infinite auto-scroll functionnality.

From what I found on the net and my personnal testing, I can automate an infinite or finite scroll using the page.evaluate() function and some Javascript code.

For example, this works:

for i in range(20):
    page.evaluate('var div = document.getElementsByClassName("comment-container")[0];div.scrollTop = div.scrollHeight')
    page.wait_for_timeout(500)

The problem with this approach is that it will either work by specifying a number of scrolls or by telling it to keep going forever with a while True loop.

I need to find a way to tell it to keep scrolling until the final content loads.

This is the Javascript that I am currently trying in page.evaluate():

var intervalID = setInterval(function() {
    var scrollingElement = (document.scrollingElement || document.body);
    scrollingElement.scrollTop = scrollingElement.scrollHeight;
    console.log('fail')
}, 1000);
var anotherID = setInterval(function() {
    if ((window.innerHeight + window.scrollY) >= document.body.offsetHeight) {
        clearInterval(intervalID);
    }}, 1000)

This does not work either in my firefox browser or in the Playwright firefox browser. It returns immediately and doesn’t execute the code in intervals.

I would be grateful if someone could tell me how I can, using Playwright, create an auto-scroll function that will detect and stop when it reaches the bottom of a dynamically loading webpage.

Asked By: alex_bits

||

Answers:

So I found a working solution.

What I did was to combine Javascript with python Playwright code.

I start the setInterval with a timer of 200ms to scroll down on the page with page.evaluate() and then I follow it up with a python loop that checks every second whether the total height of the page (scroll included) has changed. If it changes it continues to scroll and if it hasn’t changed than the scroll is over.
This is what it looks like:

page.evaluate(
    """
    var intervalID = setInterval(function () {
        var scrollingElement = (document.scrollingElement || document.body);
        scrollingElement.scrollTop = scrollingElement.scrollHeight;
    }, 200);

    """
)
prev_height = None
while True:
    curr_height = page.evaluate('(window.innerHeight + window.scrollY)')
    if not prev_height:
        prev_height = curr_height
        time.sleep(1)
    elif prev_height == curr_height:
        page.evaluate('clearInterval(intervalID)')
        break
    else:
        prev_height = curr_height
        time.sleep(1)

EDIT

See the below answer using the new mouse.wheel(x, y) feature for an up to date way to scroll using playwright. Combine my answer with his to lessen the need to use JS.

Answered By: alex_bits

The new Playwright version has a scroll function. it’s called mouse.wheel(x, y). In the below code, we’ll be attempting to scroll through youtube.com which has an "infinite scroll":

from playwright.sync_api import Playwright, sync_playwright
import time


def run(playwright: Playwright) -> None:
    browser = playwright.chromium.launch(headless=False)
    context = browser.new_context()

    # Open new page
    page = context.new_page()

    page.goto('https://www.youtube.com/')

    # page.mouse.wheel(horizontally, vertically(positive is 
    # scrolling down, negative is scrolling up)
    for i in range(5): #make the range as long as needed
        page.mouse.wheel(0, 15000)
        time.sleep(2)
        i += 1
    
    time.sleep(15)
    # ---------------------
    context.close()
    browser.close()


with sync_playwright() as playwright:
    run(playwright)
Answered By: Mhmd Khalil

This topic old, but new to me. I have been using the playwright wheel scroll but for me it takes control/focus on the mouse.

So if I happen to be typing (which i usually am) and it scrolls, my beautiful words go into the void to never be seen again.

I am going to go ahead and try out the js solution posted above and see if that gets me around the mouse/focus issue.

Answered By: Cliff

The other solutions were a tad bit verbose and "overkill" for me and this is what worked for me.

Here’s a two liner that took me a few migraines to come around to 🙂

Note: you are going to have to put in your own selector. This is just an example…

    while page.locator("span",has_text="End of results").is_visible() is False:
        page.mouse.wheel(0,100)
        #page.keyboard.down(PageDown) also works

Literally just keep scrolling until some sort of unique selector is present. In this case a span tag with the string "End of results" (for the context of my use case) popped up when you scroll to the bottom.

I trust you can translate this logic for you own usage..

Answered By: Alex

the playwright has the page.keyboard.down('End') command, it will scroll to the end of the page.

Answered By: Poker Player