Using Selenium and Python to scrape Morningstar website. Selenium doesn't download the full webpage
Question:
Here’s my code:
from selenium import webdriver
import pandas as pd
from lxml import etree
url = 'https://www.morningstar.com/stocks/xbsp/UGPA3/quote'
browser = webdriver.Chrome()
browser.get(url)
htmlpage = browser.page_source
doc = etree.HTML(htmlpage)
cap = doc.xpath(
'/html/body/div[1]/div/div/div[3]/main/div[2]/div/div/div[1]/sal-components/section/div/div/div[1]/div/div[2]/div/div/div/div[2]/ul/li[7]/div/div[2]/text()')
print(cap)
I’m trying to scrape the Market Cap number from the webpage.
I found out after writing the htmlpage variable to a file that the problem is that it’s not downloading the whole page. It downloads 2228 KB, while my browser downloads a 2664 KB .html file plus a folder that’s not necessary. If I manually save the page with my browser and use its contents as an input to etree.HTML() it works, but I want to automate.
Answers:
try this
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time
CHROME_DRIVER_PATH = "/usr/local/bin/chromedriver"
url = 'https://www.morningstar.com/stocks/xbsp/UGPA3/quote'
browser = webdriver.Chrome(executable_path=CHROME_DRIVER_PATH)
browser.get(url)
time.sleep(2)
# get cap value from page source and wait for element is present
cap = WebDriverWait(browser, 10).until(
EC.element_to_be_clickable((By.XPATH,
'//*[@id="__layout"]/div/div[3]/main/div[2]/div/div/div[1]/sal-components/section/div/div/div[1]/div/div[2]/div/div/div/div[2]/ul/li[7]/div/div[2]')))
cap_value = cap.text
print(cap_value)
Here’s my code:
from selenium import webdriver
import pandas as pd
from lxml import etree
url = 'https://www.morningstar.com/stocks/xbsp/UGPA3/quote'
browser = webdriver.Chrome()
browser.get(url)
htmlpage = browser.page_source
doc = etree.HTML(htmlpage)
cap = doc.xpath(
'/html/body/div[1]/div/div/div[3]/main/div[2]/div/div/div[1]/sal-components/section/div/div/div[1]/div/div[2]/div/div/div/div[2]/ul/li[7]/div/div[2]/text()')
print(cap)
I’m trying to scrape the Market Cap number from the webpage.
I found out after writing the htmlpage variable to a file that the problem is that it’s not downloading the whole page. It downloads 2228 KB, while my browser downloads a 2664 KB .html file plus a folder that’s not necessary. If I manually save the page with my browser and use its contents as an input to etree.HTML() it works, but I want to automate.
try this
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time
CHROME_DRIVER_PATH = "/usr/local/bin/chromedriver"
url = 'https://www.morningstar.com/stocks/xbsp/UGPA3/quote'
browser = webdriver.Chrome(executable_path=CHROME_DRIVER_PATH)
browser.get(url)
time.sleep(2)
# get cap value from page source and wait for element is present
cap = WebDriverWait(browser, 10).until(
EC.element_to_be_clickable((By.XPATH,
'//*[@id="__layout"]/div/div[3]/main/div[2]/div/div/div[1]/sal-components/section/div/div/div[1]/div/div[2]/div/div/div/div[2]/ul/li[7]/div/div[2]')))
cap_value = cap.text
print(cap_value)