Python Selenium – get href value

Question:

I am trying to copy the href value from a website, and the html code looks like this:

<p class="sc-eYdvao kvdWiq">
 <a href="https://www.iproperty.com.my/property/setia-eco-park/sale- 
 1653165/">Shah Alam Setia Eco Park, Setia Eco Park
 </a>
</p>

I’ve tried driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") but it returned 'list' object has no attribute 'get_attribute'. Using driver.find_element_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") returned None. But i cant use xpath because the website has like 20+ href which i need to copy all. Using xpath would only copy one.

If it helps, all the 20+ href are categorised under the same class which is sc-eYdvao kvdWiq.

Ultimately i would want to copy all the 20+ href and export them out to a csv file.

Appreciate any help possible.

Asked By: Eric Choi

||

Answers:

try something like:

elems = driver.find_elements_by_xpath("//p[contains(@class, 'sc-eYdvao') and contains(@class='kvdWiq')]/a")
for elem in elems:
   print elem.get_attribute['href']
Answered By: Robert

The XPATH

//p[@class='sc-eYdvao kvdWiq']/a

return the elements you are looking for.

Writing the data to CSV file is not related to the scraping challenge. Just try to look at examples and you will be able to do it.

Answered By: balderman

You want driver.find_elements if more than one element. This will return a list. For the css selector you want to ensure you are selecting for those classes that have a child href

elems = driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq [href]")
links = [elem.get_attribute('href') for elem in elems]

You might also need a wait condition for presence of all elements located by css selector.

elems = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sc-eYdvao.kvdWiq [href]")))
Answered By: QHarr

As per the given HTML:

<p class="sc-eYdvao kvdWiq">
    <a href="https://www.iproperty.com.my/property/setia-eco-park/sale-1653165/">Shah Alam Setia Eco Park, Setia Eco Park</a>
</p>

As the href attribute is within the <a> tag ideally you need to move deeper till the <a> node. So to extract the value of the href attribute you can use either of the following Locator Strategies:

  • Using css_selector:

    print(driver.find_element_by_css_selector("p.sc-eYdvao.kvdWiq > a").get_attribute('href'))
    
  • Using xpath:

    print(driver.find_element_by_xpath("//p[@class='sc-eYdvao kvdWiq']/a").get_attribute('href'))
    

If you want to extract all the values of the href attribute you need to use find_elements* instead:

  • Using css_selector:

    print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_css_selector("p.sc-eYdvao.kvdWiq > a")])
    
  • Using xpath:

    print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_xpath("//p[@class='sc-eYdvao kvdWiq']/a")])
    

Dynamic elements

However, if you observe the values of class attributes i.e. sc-eYdvao and kvdWiq ideally those are dynamic values. So to extract the href attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a"))).get_attribute('href'))
    
  • Using XPATH:

    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//p[@class='sc-eYdvao kvdWiq']/a"))).get_attribute('href'))
    

If you want to extract all the values of the href attribute you can use visibility_of_all_elements_located() instead:

  • Using CSS_SELECTOR:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a")))])
    
  • Using XPATH:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='sc-eYdvao kvdWiq']/a")))])
    

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait     
from selenium.webdriver.common.by import By     
from selenium.webdriver.support import expected_conditions as EC
Answered By: undetected Selenium

To crawl any hyperlink or Href, proxycrwal API is ideal as it uses pre-built functions for fetching desired information. Just pip install the API and follow the code to get the required output. The second approach to fetch Href links using python selenium is to run the following code.

Source Code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time

list = ['https://www.heliosholland.com/Ampullendoos-voor-63-ampullen','https://www.heliosholland.com/lege-testdozen’]
driver = webdriver.Chrome()
wait = WebDriverWait(driver,29)

for i in list: 
  driver.get(i)
  image = wait.until(EC.visibility_of_element_located((By.XPATH,'/html/body/div[1]/div[3]/div[2]/div/div[2]/div/div/form/div[1]/div[1]/div/div/div/div[1]/div/img'))).get_attribute('src')
  print(image)

To scrape the link, use .get_attribute(‘src’).

Answered By: Bilal