How to scroll down web page slowly using selenium python?

Question:

I wanted to scroll down web page using selenium. Found this: How can I scroll a web page using selenium webdriver in python?

Took this code as shown here:

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

It works fine. But I have found some issue in my main code because of code above. I want to parse twitter. If twitter account is long, in html code of web page there are a few twits. Not all twits of this account.

Example: I scroll down web page, and in html code of web page contains only those twits which are visible for me (which I can see). Due to this thing i can’t catch all the twits. This code above scrolls page quickly. How can I slow down scrolling?

I tried to solve it and wrote dumb code:

    last_height = driver.execute_script("return document.body.scrollHeight")
    print(last_height)

    # Scroll down to bottom
    y = 600
    finished = False
    while True:
        for timer in range(0, 100):
            driver.execute_script("window.scrollTo(0, " + str(y) + ")")
            y += 600
            sleep(1)
            new_height = driver.execute_script("return document.body.scrollHeight")
            print(new_height, last_height)

            if new_height == last_height: #on the first iteration new_height equals last_height
                print('stop')
                finished = True
                break
            last_height = new_height
        if finished:
            break

This code doesn’t work. On the first iteration new_height equals to last_height Please, help me.
If you can fix my code, fix it. If you can write another elegant solution, write it please.

UPD:

This scrolling has to be infinity. For example: i scroll down facebook account ’till i scroll it fully. That’s why i have last_height and new_height variables. In my code when last_height equals to new_height that’s mean page has been scrolled to the end and we can stop scrolling it(we can exit). But i missed something. My code doesn’t work.

Asked By: alex-uarent-alex

||

Answers:

I have worked on the Twitter bot, when you scroll down it updates the page’s HTML and removes some tweets from above. The algorithm I used is:

  • create an empty list for tweet URLs.
  • Collect available tweets and then for each tweet check if its URL is in the list, if not then add it and do the process on tweet’s content what you want otherwise ignore that tweet.
  • get the height of page current_height = DriverWrapper.cd.execute_script("return document.body.scrollHeight")
  • Scroll down the page and if new_height == current_height end otherwise repeat from 2nd step..
Answered By: Faizan AlHassan

This code moves the scrollbar and gets its position. It always compares an initial and a final position, if both are the same it means that the scrollbar stopped moving (it reached the end) and the while is broken.

driver = webdriver.Chrome()
driver.implicitly_wait(5)

# navigate to the website
driver.get(URL)
driver.implicitly_wait(5)

# get the position of scroll
scroll_pos_init = driver.execute_script("return window.pageYOffset;")
stepScroll = 300

while True:
    driver.execute_script(f"window.scrollBy(0, {stepScroll});")
    scroll_pos_end = driver.execute_script("return window.pageYOffset;")
    time.sleep(0.75)
    if scroll_pos_init >= scroll_pos_end:
        break
    scroll_pos_init = scroll_pos_end

# get the raw HTML content
html = driver.page_source

# close the browser
driver.quit()
Answered By: richie101