Trying to extract text from a website but getting error

Question:

I don’t understand why I am getting an error. I am trying to get the description and price of the first 5 search results on the following web page. The code works for the csv file search and opening the browser, searching for the term and the browser displaying the search results. It seems to fail when it tries to execute the

title_element = driver.find_element(By.XPATH, ...)

line. I included the XPaths that I copied from the elements on the inspect page in the code comments:

https://www.amazon.co.uk/s?k=30%E2%80%B3+Straight+Long+Ponytail+by+Exposed+Luxury+Hair+%281pc%29+-+1001+-+Light+Blonde%2C+1+Piece&crid=2CHXDZHEV7YLX&sprefix=30+straight+long+ponytail+by+exposed+luxury+hair+1pc+-+1001+-+light+blonde+1+piece%2Caps%2C75&ref=nb_sb_noss

My code is as follows:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

data = pd.read_csv("data.csv")

i = 1
while i < 6:
    print(data.at[i, "Name"])
    print(data.at[i, "Regular price"])
    i += 1

# create a new Chrome browser instance
driver = webdriver.Chrome()
# navigate to the Amazon website
driver.get("https://www.amazon.co.uk")

time.sleep(5)

# find the search bar element and input a search term
search_bar = driver.find_element(By.ID, "twotabsearchtextbox")
search_term = "30″ Straight Long Ponytail by Exposed Luxury Hair (1pc) - 1001 - Light Blonde, 1 Piece"
search_bar.send_keys(search_term)

# press the return key to initiate the search
search_bar.send_keys(u'ue007')

# locate the title and price elements for the first 5 results
for i in range(1, 6):
    time.sleep(5)
    title_element = driver.find_element(By.XPATH, f"//*   [@id='search']/div[1]/div[1]/div/span[1]/div[1]/div[3]/div/div/div/div/div[3]/div[1]/h2/a/span/text()")
    price_element = driver.find_element(By.XPATH, f"//*[@id='search']/div[1]/div[1]/div/span[1]/div[1]/div[3]/div/div/div/div/div[3]/div[3]/div[1]/a/span[1]/span[1]")

    # Title Xpath - //*[@id="search"]/div[1]/div[1]/div/span[1]/div[1]/div[3]/div/div/div/div/div[3]/div[1]/h2/a/span/text()
    # Price Xpath - //*    [@id="search"]/div[1]/div[1]/div/span[1]/div[1]/div[3]/div/div/div/div/div[3]/div[3]/div[1]/a/span[1]/span[1]

    title = title_element.text
    price = price_element.text

    # print the title and price
    print(f"Product {i} - Title: {title}, Price: {price}")

input("Press Enter to close the browser...")
driver.close()

The error I get from the command prompt is:

c:UsersdrwynOneDriveDesktopAmazonScraper>python scrape2.py
30″ Straight Long Ponytail by Exposed Luxury Hair (1pc) - 1001 - Light Blonde, 1 Piece
£6.95
30″ Straight Long Ponytail by Exposed Luxury Hair (1pc) - 101 - Platinum Blonde, 1 Piece
>£6.95
30″ Straight Long Ponytail by Exposed Luxury Hair (1pc) - 10-12 - Medium Blonde Mix, 1 Piece
£6.95
30″ Straight Long Ponytail by Exposed Luxury Hair (1pc) - 10-16 - Natural Blonde Mix, 1 Piece
£6.95
30″ Straight Long Ponytail by Exposed Luxury Hair (1pc) - 10-16-22 - Classic Blonde Mix, 1 Piece
£6.95

DevTools listening on ws://127.0.0.1:58513/devtools/browser/055aa211-329d-4c4b-bab5-6f8705b5c93f
[16672:17304:0119/211326.431:ERROR:device_event_log_impl.cc(215)] [21:13:26.434] USB: usb_device_handle_win.cc:1046 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[16672:17304:0119/211326.438:ERROR:device_event_log_impl.cc(215)] [21:13:26.438] USB: usb_device_handle_win.cc:1046 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
Traceback (most recent call last):
File "c:UsersdrwynOneDriveDesktopAmazonScraperscrape2.py", line 35, in <module>
title_element = driver.find_element(By.XPATH, f"//*[@id='search']/div[1]/div[1]/div/span[1]/div[1]/div[3]/div/div/div/div/div[3]/div[1]/h2/a/span/text()")
File "C:Python310libsite-packagesseleniumwebdriverremotewebdriver.py", line 861, in find_element
return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]

File "C:Python310libsite-packagesseleniumwebdriverremotewebdriver.py", line 444, in execute
self.error_handler.check_response(response)
File "C:Python310libsite-packagesseleniumwebdriverremoteerrorhandler.py", line 249, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id='search']/div[1]/div[1]/div/span[1]/div[1]/div[3]/div/div/div/div/div[3]/div[1]/h2/a/span/text()"}
  (Session info: chrome=109.0.5414.75)
Stacktrace:
Backtrace:
        (No symbol) [0x008F6643]
         (No symbol) [0x0088BE21]
         (No symbol) [0x0078DA9D]
         (No symbol) [0x007C1342]
         (No symbol) [0x007C147B]
         (No symbol) [0x007F8DC2]
         (No symbol) [0x007DFDC4]
         (No symbol) [0x007F6B09]
         (No symbol) [0x007DFB76]
         (No symbol) [0x007B49C1]
         (No symbol) [0x007B5E5D]
 GetHandleVerifier [0x00B6A142+2497106]
 GetHandleVerifier [0x00B985D3+2686691]
 GetHandleVerifier [0x00B9BB9C+2700460]
 GetHandleVerifier [0x009A3B10+635936]
         (No symbol) [0x00894A1F]
         (No symbol) [0x0089A418]
         (No symbol) [0x0089A505]
         (No symbol) [0x008A508B]
 BaseThreadInitThunk [0x76D87D69+25]
 RtlInitializeExceptionChain [0x77B9BB9B+107]
 RtlClearBits [0x77B9BB1F+191]

The images below show the elements where I extracted the Xpaths from.

Page source 1

Page source 2

Asked By: David Wyness

||

Answers:

You are having several problems here.

  1. XPath ending with /text() is not a valid XPath to locate web element with Selenium.
  2. Very long absolute XPaths like this //*[@id='search']/div[1]/div[1]/div/span[1]/div[1]/div[3]/div/div/div/div/div[3]/div[3]/div[1]/a/span[1]/span[1] are extremelly breakable.
  3. You are using the same XPath without any index in a loop. Even if these XPaths were correct you would get the same value each time.
  4. In order to extract a text value from web element .text method should be applied on the web element.
    To get the title and the price as you want try this:
for i in range(1, 6):
    title_locator = "(//div[contains(@class,'s-title-instructions-style')]//span[@class='a-size-base-plus a-color-base a-text-normal'])[{}]".format(i)
    price_locator = "(//div[contains(@class,'a-section')]//span[@class='a-price'])[{}]".format(i)
    title = driver.find_element(By.XPATH, title_locator).text
    price = driver.find_element(By.XPATH, price_locator).text
Answered By: Prophet