Scrape Name & Address from website site using Selenium Python
Question:
I would like to scrape the "Name" & "Address" from the following site:
https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3
However I am struggling with the referencing the correct field within the page and returning the results
Where I need your help is, to provide a working solution where the query, grabs the "name" from the webpage and provides the output of the "name"
Code:
import string
import pandas as pd
from lxml import html
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from IPython.core.display import display, HTML
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
Example Reference:
driver = webdriver.Chrome(chrome_options = options, executable_path=r'C:Downloadschromedriver.exe')
driver.get("https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3")
title = driver.find_elements(By.CSS_SELECTOR,'.slds-media__body h1 > a')
print(title.text)
Looking forward to your help!
Answers:
Use webdriverwait and wait for visibility of element located.
driver.get("https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3")
name=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".slds-media__body h1"))).text
print(name)
address=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h4[data-aura-rendered-by] ~p:nth-of-type(1)"))).text
print(address)
you need to import below libaries.
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
To extract the Name and Address ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
-
Using Name:
driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h1"))).text)
-
Using Address:
driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[.//div[contains(., 'Address')]]//following-sibling::p[1]"))).text)
-
Console Output:
Mason Owen and Partners Ltd
Unity Building
20 Chapel Street
Liverpool
Merseyside
L3 9AG
L 3 9 A G
-
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium – Python
In addition to using WebDriverWait and visibility_of_element_located like others are suggesting, it’s sometimes necessary to scroll an item into view.
This is a little function to make it more convenient to execute the JavaScript that does it:
def scrollto(element):
driver.execute_script("return arguments[0].scrollIntoView(true);", element)
I would like to scrape the "Name" & "Address" from the following site:
https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3
However I am struggling with the referencing the correct field within the page and returning the results
Where I need your help is, to provide a working solution where the query, grabs the "name" from the webpage and provides the output of the "name"
Code:
import string
import pandas as pd
from lxml import html
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from IPython.core.display import display, HTML
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
Example Reference:
driver = webdriver.Chrome(chrome_options = options, executable_path=r'C:Downloadschromedriver.exe')
driver.get("https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3")
title = driver.find_elements(By.CSS_SELECTOR,'.slds-media__body h1 > a')
print(title.text)
Looking forward to your help!
Use webdriverwait and wait for visibility of element located.
driver.get("https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3")
name=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".slds-media__body h1"))).text
print(name)
address=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h4[data-aura-rendered-by] ~p:nth-of-type(1)"))).text
print(address)
you need to import below libaries.
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
To extract the Name and Address ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
-
Using Name:
driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN') print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h1"))).text)
-
Using Address:
driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN') print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[.//div[contains(., 'Address')]]//following-sibling::p[1]"))).text)
-
Console Output:
Mason Owen and Partners Ltd Unity Building 20 Chapel Street Liverpool Merseyside L3 9AG L 3 9 A G
-
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium – Python
In addition to using WebDriverWait and visibility_of_element_located like others are suggesting, it’s sometimes necessary to scroll an item into view.
This is a little function to make it more convenient to execute the JavaScript that does it:
def scrollto(element):
driver.execute_script("return arguments[0].scrollIntoView(true);", element)