Scrape Name & Address from website site using Selenium Python

Question:

I would like to scrape the "Name" & "Address" from the following site:

https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3

However I am struggling with the referencing the correct field within the page and returning the results

Where I need your help is, to provide a working solution where the query, grabs the "name" from the webpage and provides the output of the "name"

Code:

import string
import pandas as pd
from lxml import html
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from IPython.core.display import display, HTML

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

Example Reference:

driver = webdriver.Chrome(chrome_options = options, executable_path=r'C:Downloadschromedriver.exe')    
driver.get("https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3")
title = driver.find_elements(By.CSS_SELECTOR,'.slds-media__body h1 > a')
print(title.text)

Looking forward to your help!

Asked By: Masond3

||

Answers:

Use webdriverwait and wait for visibility of element located.

driver.get("https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3")
name=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".slds-media__body h1"))).text
print(name)
address=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h4[data-aura-rendered-by] ~p:nth-of-type(1)"))).text
print(address)

you need to import below libaries.

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
Answered By: KunduK

To extract the Name and Address ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using Name:

    driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h1"))).text)
    
  • Using Address:

    driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[.//div[contains(., 'Address')]]//following-sibling::p[1]"))).text)
    
  • Console Output:

    Mason Owen and Partners Ltd
    Unity Building
    20 Chapel Street
    Liverpool
    Merseyside
    L3 9AG
    L 3 9 A G
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium – Python

Answered By: undetected Selenium

In addition to using WebDriverWait and visibility_of_element_located like others are suggesting, it’s sometimes necessary to scroll an item into view.

This is a little function to make it more convenient to execute the JavaScript that does it:

def scrollto(element):
            driver.execute_script("return arguments[0].scrollIntoView(true);", element)
Answered By: natstandridge