How to scroll down web page slowly using selenium python?
Question:
I wanted to scroll down web page using selenium. Found this: How can I scroll a web page using selenium webdriver in python?
Took this code as shown here:
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
It works fine. But I have found some issue in my main code because of code above. I want to parse twitter. If twitter account is long, in html code of web page there are a few twits. Not all twits of this account.
Example: I scroll down web page, and in html code of web page contains only those twits which are visible for me (which I can see). Due to this thing i can’t catch all the twits. This code above scrolls page quickly. How can I slow down scrolling?
I tried to solve it and wrote dumb code:
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)
# Scroll down to bottom
y = 600
finished = False
while True:
for timer in range(0, 100):
driver.execute_script("window.scrollTo(0, " + str(y) + ")")
y += 600
sleep(1)
new_height = driver.execute_script("return document.body.scrollHeight")
print(new_height, last_height)
if new_height == last_height: #on the first iteration new_height equals last_height
print('stop')
finished = True
break
last_height = new_height
if finished:
break
This code doesn’t work. On the first iteration new_height equals to last_height Please, help me.
If you can fix my code, fix it. If you can write another elegant solution, write it please.
UPD:
This scrolling has to be infinity. For example: i scroll down facebook account ’till i scroll it fully. That’s why i have last_height and new_height variables. In my code when last_height equals to new_height that’s mean page has been scrolled to the end and we can stop scrolling it(we can exit). But i missed something. My code doesn’t work.
Answers:
I have worked on the Twitter bot, when you scroll down it updates the page’s HTML and removes some tweets from above. The algorithm I used is:
- create an empty list for tweet URLs.
- Collect available tweets and then for each tweet check if its URL is in the list, if not then add it and do the process on tweet’s content what you want otherwise ignore that tweet.
- get the height of page
current_height = DriverWrapper.cd.execute_script("return document.body.scrollHeight")
- Scroll down the page and if
new_height == current_height
end otherwise repeat from 2nd step..
This code moves the scrollbar and gets its position. It always compares an initial and a final position, if both are the same it means that the scrollbar stopped moving (it reached the end) and the while is broken.
driver = webdriver.Chrome()
driver.implicitly_wait(5)
# navigate to the website
driver.get(URL)
driver.implicitly_wait(5)
# get the position of scroll
scroll_pos_init = driver.execute_script("return window.pageYOffset;")
stepScroll = 300
while True:
driver.execute_script(f"window.scrollBy(0, {stepScroll});")
scroll_pos_end = driver.execute_script("return window.pageYOffset;")
time.sleep(0.75)
if scroll_pos_init >= scroll_pos_end:
break
scroll_pos_init = scroll_pos_end
# get the raw HTML content
html = driver.page_source
# close the browser
driver.quit()
I wanted to scroll down web page using selenium. Found this: How can I scroll a web page using selenium webdriver in python?
Took this code as shown here:
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
It works fine. But I have found some issue in my main code because of code above. I want to parse twitter. If twitter account is long, in html code of web page there are a few twits. Not all twits of this account.
Example: I scroll down web page, and in html code of web page contains only those twits which are visible for me (which I can see). Due to this thing i can’t catch all the twits. This code above scrolls page quickly. How can I slow down scrolling?
I tried to solve it and wrote dumb code:
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)
# Scroll down to bottom
y = 600
finished = False
while True:
for timer in range(0, 100):
driver.execute_script("window.scrollTo(0, " + str(y) + ")")
y += 600
sleep(1)
new_height = driver.execute_script("return document.body.scrollHeight")
print(new_height, last_height)
if new_height == last_height: #on the first iteration new_height equals last_height
print('stop')
finished = True
break
last_height = new_height
if finished:
break
This code doesn’t work. On the first iteration new_height equals to last_height Please, help me.
If you can fix my code, fix it. If you can write another elegant solution, write it please.
UPD:
This scrolling has to be infinity. For example: i scroll down facebook account ’till i scroll it fully. That’s why i have last_height and new_height variables. In my code when last_height equals to new_height that’s mean page has been scrolled to the end and we can stop scrolling it(we can exit). But i missed something. My code doesn’t work.
I have worked on the Twitter bot, when you scroll down it updates the page’s HTML and removes some tweets from above. The algorithm I used is:
- create an empty list for tweet URLs.
- Collect available tweets and then for each tweet check if its URL is in the list, if not then add it and do the process on tweet’s content what you want otherwise ignore that tweet.
- get the height of page
current_height = DriverWrapper.cd.execute_script("return document.body.scrollHeight")
- Scroll down the page and if
new_height == current_height
end otherwise repeat from 2nd step..
This code moves the scrollbar and gets its position. It always compares an initial and a final position, if both are the same it means that the scrollbar stopped moving (it reached the end) and the while is broken.
driver = webdriver.Chrome()
driver.implicitly_wait(5)
# navigate to the website
driver.get(URL)
driver.implicitly_wait(5)
# get the position of scroll
scroll_pos_init = driver.execute_script("return window.pageYOffset;")
stepScroll = 300
while True:
driver.execute_script(f"window.scrollBy(0, {stepScroll});")
scroll_pos_end = driver.execute_script("return window.pageYOffset;")
time.sleep(0.75)
if scroll_pos_init >= scroll_pos_end:
break
scroll_pos_init = scroll_pos_end
# get the raw HTML content
html = driver.page_source
# close the browser
driver.quit()