How to not wait for a page to fully load, selenium python

Question:

I have this code that should take around 360 hours to fully complete and it’s all because of the slow servers of the website I’m trying to scrape, But when I look at the website and the python console at the same time I realize the elements I’m trying to use has already been loaded and the selenium is waiting for the useless ads and another thing I don’t care about to load. So I was wondering if there is any way to start scraping as soon as the elements needed are loaded.

Another way of doing this would be to just do the scraping even if the page is not loaded and then using the time.sleep I can time it by hand. Though this question has already been asked and answered in stack overflow so if this is the only way of doing it you can let me know in the comments otherwise better way would be to wait only for the elements needed to be scrapped which would make it way easier.

I don’t think my code could help you answer my question but ill put it here in case.

code:

#C:UserskeiboPycharmProjectsemergency ahanonline project
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium import webdriver
import pandas as pd
from webdriver_manager.chrome import ChromeDriverManager
import time

t = time.localtime()
current_time = time.strftime("%H:%M:%S", t)
print(f'[{current_time}] Started.')
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
#options.add_argument("--headless")
output=f'nState, City, Group, Sub_Group, Address, Website, Description, Views'
browser = webdriver.Chrome(options=options,service=Service(ChromeDriverManager().install()))
def tir():
    global output
    browser.get(
        'https://senf.ir/ListCompany/75483/%D8%A2%D9%87%D9%86-%D8%A2%D9%84%D8%A7%D8%AA-%D9%88-%D8%B6%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA')
    browser.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_11").click()
    pages = (browser.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_9").text)
    print(f'There are {pages} pages of 20 names which means there is {pages*20} people to save.')
    for page in range(pages-1):
        for person in range(19):
            browser.get(
                'https://senf.ir/ListCompany/75483/%D8%A2%D9%87%D9%86-%D8%A2%D9%84%D8%A7%D8%AA-%D9%88-%D8%B6%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA')
            browser.find_element(By.ID, f"ContentPlaceHolder2_grdProduct_HpCompany_{person}").click()
            try:
                state = (browser.find_element(By.XPATH,
                                              '(.//span[@id = "ContentPlaceHolder2_rpParent_lblheaderCheild_0"])').text)
                if state == '' or state == ' ':
                    state = None
            except:
                state = None
            try:
                city = (browser.find_element(By.XPATH,
                                             '(.//span[@id = "ContentPlaceHolder2_rpParent_lblheaderCheild_1"])').text)
                if city == '' or city == ' ':
                    city = None
            except:
                city = None
            try:
                group = (browser.find_element(By.XPATH,
                                              '(.//span[@id = "ContentPlaceHolder2_rpParent_lblheaderCheild_2"])').text)
                if group == '' or group == ' ':
                    group = None
            except:
                group = None
            try:
                sub_group = (browser.find_element(By.XPATH,
                                                  '(.//span[@id = "ContentPlaceHolder2_rpParent_lblheaderCheild_3"])').text)
                if sub_group == '' or sub_group == ' ':
                    sub_group = None
            except:
                sub_group = None
            try:
                Address = (browser.find_element(By.XPATH, '(.//span[@id = "ContentPlaceHolder2_txtAddress"])').text)
                if Address == '' or Address == ' ':
                    Address = None
            except:
                Address = None

            try:
                ceo = (browser.find_element(By.XPATH, '(.//span[@id = "ContentPlaceHolder2_LblManager"])').text)
                if ceo == '' or ceo == ' ':
                    ceo = None
            except:
                ceo = None
            # print(browser.find_element(By.XPATH, '(.//span[@id = "ContentPlaceHolder2_ImgEmail"])').text)
            try:
                website = str(browser.find_element(By.XPATH, '(.//a[@id = "ContentPlaceHolder2_hfWebsait"])').text)
                if website == '' or website == ' ':
                    website = None
            except:
                website = None
            try:
                Description = (browser.find_element(By.XPATH, '(.//span[@id = "ContentPlaceHolder2_lblDesc"])').text)
                if Description == '' or Description == ' ':
                    Description = None
            except:
                Description = None
            try:
                views = (browser.find_element(By.XPATH, '(.//span[@id = "ContentPlaceHolder2_lblVisit"])').text)
                if views == '' or views == ' ':
                    views = None
            except:
                views = None
            output += f'n{views}, {Description}, {website}, {Address}, {sub_group}, {group}, {city}, {state}'
            print(output)
            print('--------------------------------------------')
            browser.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_12").click()

tir()
print("End")
with open('Program FilesCSV pre built.txt') as f1:
    file1 = open("Program FilesCSV pre built.txt", "w")
    file1.write(output)
    file1.close()
read_file1 = pd.read_csv('Program FilesCSV pre built.txt')
read_file1.to_csv('Output.csv', index=False)
try:
    pass
except Exception as e:
    browser.close()
    print('something went wrong ):')
    sees=input('Press enter to leave or press 1 and than enter to see error: ')
    if sees=='1':
        input(e)
Asked By: Keyvan Abedini

||

Answers:

If you want to prioritize locating specific elements over the whole page, try using an explicit wait. If you want to wait for the whole webpage, use an implicit wait.

Explicit Wait

Where element is a desired search param & driver is your WebDriver:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

WebDriverWait(driver, timeout=5).until(lambda d: d.find_element(By.ID, 'element'))

Rather than wait for the entire webpage to load, this particular function waits for a given amount of time to find an element. For this scenario, the function waits for a maximum of 5 seconds to find the element with an ID of "element." You can assign a variable to this function to store the element it finds (if it is valid and discovered).

Implicit Wait

You mentioned using the sleep function time.sleep() to wait for the webpage to load. Selenium offers a method called Implicit Waiting for this. Rather than manually triggering a halt in the program, Selenium allows the driver to wait up to an imposed amount of time. The code is shown below, where driver is your WebDriver:

driver.implicitly_wait(5)

It is generally advised not to use time.sleep() as it "defeats the purpose of Automation." A more detailed explanation regarding implicit/static waits can be found in this post.

Explicit Wait Example

For a more direct answer to your question, we can apply an explicit wait to line 21 of your code snippet.

browser.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_11").click()

Explicit waits can store and be applied to element searches. While variable declaration for element searches is not necessary, I highly recommend doing so for elements that require more than one applied action. The above code will be replaced like so:

pagelink = WebDriverWait(browser, timeout=10).until(lambda b: b.find_element(By.ID, "ContentPlaceHolder2_rptPager_lnkPage_11"))
pagelink.click()

This explicit wait function allows a grace period of up to 10 seconds to find the element’s ID. For a more versatile use, the element is stored in the variable pagelink. Selenium then performs the click action on the element.

Implicit Wait Example

Rather than apply waits to every single element, implicit waits are used for every page that loads. Let’s apply this between lines 27 and 28 of your code where it declares:

browser.get(
    'https://senf.ir/ListCompany/75483/...')
browser.find_element(By.ID, f"ContentPlaceHolder2...").click()

Directly after the get function, we can use an implicit wait for Selenium to wait for when elements load:

browser.get("https://senf.ir/ListCompany/...")
browser.implicitly_wait(10)
browser.find_element(By.ID, f"ContentPlaceHolder2...").click()

The Selenium driver waits for up to 10 seconds for all elements on the page to load. If the driver exceeds these 10 seconds, the driver immediately runs the next statement (in this case the find_element function).

Documentation for Explicit and Implicit Waits can be found here.

Answered By: poiboi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.