How can I extract the text elements using Selenium in Python?

Question

Consider:

I am using Selenium to scrape the contents from the App Store: https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830

I tried to extract the text field "As subject matter experts, our team is very engaging…"

I tried to find elements by class

review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings

But it returns an empty list [].

Is anything wrong with the code? Or is there a better solution?

Asked By: Arthur Morgan

||

Source

Answer 1

You can use WebDriverWait to wait for visibility of an element and get the text. Please check good Selenium locator.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#...

wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
    starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
    title = review_rating.find_element_by_css_selector("h3").text
    review = review_rating.find_element_by_css_selector("p").text

Answered By: Sers

Answer 2

Mix Selenium with Beautiful Soup.

Using WebDriver:

from bs4 import BeautifulSoup
from selenium import webdriver

browser = webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")

bs = BeautifulSoup(innerHTML, 'html.parser')

bs.blockquote.p.text

Output:

Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'

Answered By: Juan C

Answer 3

Use WebDriverWait and wait for presence_of_all_elements_located and use the following CSS selector.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
    review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)

Output:

['As subject matter experts, our team is very engaging and focused on our near and long term financial health!', 'Very much seems to be an unfinished app. Can’t find secure message alert. Or any alerts for that matter. Most of my client team is missing from the “send to” list. I have other functions very useful, when away from my computer.']

Answered By: KunduK

Answer 4

Using Requests and Beautiful Soup:

import requests
from bs4 import BeautifulSoup

url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'

res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)

Output:

As subject matter experts, our team is very engaging and focused on our near and long term financial health!

Answered By: MITHU

How can I extract the text elements using Selenium in Python?

Question:

Answers:

Output: