driver doesn't close after not finding an element, python

Question:

could you find an error in my code? I haven’t been able to get over this code for a week now, so I am forced to ask a community.
I am trying to download 14.000 html pages into a folder (I use selenium), I have a long list of ids that I paste into a webpage address. Because the website I am downloading htmls from is protected with captcha, I am using a proxy (first, I scrap free proxies from an online source and try to find a working one – when proxy fails I am telling my driver to close).
The problem I am facing is the following:

  1. using a working driver (with a working proxy credentials) for every id in my list, I get the page. (works fine)
  2. I am inspecting the page for a table – if it is there, I can download it, if driver.get returns me a captcha I want to close the driver. BUT IT DOES NOT CLOSE. For whatever reason, selenium is perfectly fine downloading pages with no captcha, but when it gets captcha it just doesn’t do anything! As if PyCharm is stuck. I am confused. The code part with proxies and finding a workable driver is okay, I just think the error is in the last lines of my code. Please see the code below:
#function to find an element. returns 1 if it finds and 0 if not

def find_element(driver, test_xpath = 'restab') -> int:
    if driver.find_elements(By.ID, test_xpath):
        var = 1
    else:
        var = 0
    return var

#function to download pages if the element is found and close driver if element is not located

def data_fill(id_list: str, driver) -> int:
    for id in id_list:
        author_page = "https://www.elibrary.ru/author_profile_new_titles.asp?id={}".format(id)
        driver.implicitly_wait(300)
        driver.get(author_page)
        result = find_element(driver)
        if result == 0:
            driver.close()
        else:
            n = os.path.join(f"/Users/dariagerashchenko/PycharmProjects/python_practice/hist/j_profile{id}.html")
            f = codecs.open(n, "w", "utf−8")
            h = driver.page_source
            f.write(h)
    return 1

# calling a function to get the code running
k = 0
while True:

    if k % 5 == 0:
        proxy_list = get_proxies()
    k += 1
    driver = get_best_driver(driver_path = driver_path, proxy_list = proxy_list) # find the working driver
    if driver is None:
        continue
    session_result = data_fill(id_list = id_list, driver=driver)
    if session_result == 1:  # data is collected
        print("Data collected.")

I tried multiple constellations to tell the driver to close, but failed many times. Previously I worked in R, and just recently switched to python, so maybe it is just my lack of knowledge.

Asked By: Daria

||

Answers:

To close a selenium driver it is a manual method you have to call
check out this website if you need:
https://www.geeksforgeeks.org/close-driver-method-selenium-python/

Answered By: Harsha Addanki

Looks like the code enters an infinite loop. The while loop is to blame for this.

while True:
    ## ...
    driver = get_best_driver(driver_path = driver_path, proxy_list = proxy_list) # find the working driver
    if driver is None:
        continue
    ## ...

Notice that when the driver is None then the loop never ends. So have an limit for the maximum tries by using a for loop instead of while loop.

for _ in range(100):
  ## ...

Could please post the code for the function get_best_driver for further clarity?

Answered By: Nikhil Devadiga

Thanks to Nikhil Devadiga for his ideas,
eventually I found an answer myself. Here it is:

k = 0
while True:
if k % 5 == 0:
    proxy_list = get_proxies()
k += 1
driver = get_best_driver(driver_path = driver_path, proxy_list = proxy_list)
for id in id_list:
    session_result = data_fill(id_list = id, driver=driver)
    if session_result == 0:
        driver.close()
        break
    continue
print('done')

But before I modified another part of my code:

def data_fill(id_list: str, driver) -> int:
    author_page = "https://www.elibrary.ru/author_profile_new_titles.asp?id={}".format(id_list)
    driver.get(author_page)
    result = find_element(driver)
    if result == 0:
        output = 0
    else:
        n = os.path.join(f"/Users/dariagerashchenko/PycharmProjects/python_practice/hist/j_profile{id_list}.html")
        f = codecs.open(n, "w", "utf−8")
        h = driver.page_source
        f.write(h)
        output = 1
    return output
Answered By: Daria