Getting list of all the URLs in a Closed Issue page in GitHub using Selenium

Question

I am trying to store the links of all the closed issues from a GitHub (https://github.com/mlpack/mlpack/issues?q=is%3Aissue+is%3Aclosed) project using Selenium. I use the code below:

repo_closed_url = [link.find_element(By.CLASS_NAME,'h4').get_attribute('href') for link in driver.find_elements(By.XPATH,'//div[@aria-label="Issues"]')]

However, the above code only returns the first URL. How can I get all the URLs in that page? I iterate through all the pages. So just getting the links from the first page is fine.

Asked By: Mano Haran

||

Source

Answer 1

Please try this. This should work:

repo_closed_url = [link.get_attribute('href') for link in driver.find_elements(By.XPATH,"//div[@aria-label='Issues']//a[contains(@class,'h4')]")]

Here //div[@aria-label='Issues']//a[contains(@class,'h4')] XPath locates directly all the desired title elements on the page.
Then the rest of the code in the line is iterating over the list of returning elements extracting their href attributes as I explained in the previous question.

Answered By: Prophet

Answer 2

Try the below XPath expression:

//div[@aria-label='Issues']//a[contains(@id,'issue')]

This XPath expression will list all the closed issues in page 1. Just use .get_attribute('href') to get the URLs.

Answered By: Shawn

Answer 3

To extract the links from all the href attributes you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR:

driver.get("https://github.com/mlpack/mlpack/issues?q=is%3Aissue+is%3Aclosed")
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id^='issue_'] a[id^='issue']")))])

Using XPATH:

driver.get("https://github.com/mlpack/mlpack/issues?q=is%3Aissue+is%3Aclosed")
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id, 'issue_')]//a[starts-with(@id, 'issue')]")))])

Console Output:

['https://github.com/mlpack/mlpack/issues/3371', 'https://github.com/mlpack/mlpack/issues/3370', 'https://github.com/mlpack/mlpack/issues/3369', 'https://github.com/mlpack/mlpack/issues/3368', 'https://github.com/mlpack/mlpack/issues/3367', 'https://github.com/mlpack/mlpack/issues/3365', 'https://github.com/mlpack/mlpack/issues/3364', 'https://github.com/mlpack/mlpack/issues/3363', 'https://github.com/mlpack/mlpack/issues/3356', 'https://github.com/mlpack/mlpack/issues/3353', 'https://github.com/mlpack/mlpack/issues/3352', 'https://github.com/mlpack/mlpack/issues/3351', 'https://github.com/mlpack/mlpack/issues/3348', 'https://github.com/mlpack/mlpack/issues/3340', 'https://github.com/mlpack/mlpack/issues/3338', 'https://github.com/mlpack/mlpack/issues/3336', 'https://github.com/mlpack/mlpack/issues/3333', 'https://github.com/mlpack/mlpack/issues/3329', 'https://github.com/mlpack/mlpack/issues/3326', 'https://github.com/mlpack/mlpack/issues/3325', 'https://github.com/mlpack/mlpack/issues/3324', 'https://github.com/mlpack/mlpack/issues/3323', 'https://github.com/mlpack/mlpack/issues/3319', 'https://github.com/mlpack/mlpack/issues/3314', 'https://github.com/mlpack/mlpack/issues/3303']

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Answered By: undetected Selenium

Getting list of all the URLs in a Closed Issue page in GitHub using Selenium

Question:

Answers: