How to grab URL in "View Deal" and price for deal from kayak.com using BeautifulSoup

Question:

I have a list of Kayak URLs and I’d like to grap the price and link in "View Deal" for the "Best" and "Cheapest" HTML cards, essentially the first two results since I’ve already sorted the results in the URLs (here’s an example of a URL).

I can’t get around to grabbing these bits of data using beautifulsoup and I could use some help! Here’s what I’ve tried for pulling price info but I’m getting an empty prices_list variable. Below is a screenshot of what exactly I’d like to pull info from in the website.

url = https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a
requests = 0

chrome_options = webdriver.ChromeOptions()
agents = ["Firefox/66.0.3","Chrome/73.0.3683.68","Edge/16.16299"]
print("User agent: " + agents[(requests%len(agents))])
chrome_options.add_argument('--user-agent=' + agents[(requests%len(agents))] + '"')    
chrome_options.add_experimental_option('useAutomationExtension', False)

driver = webdriver.Chrome('/Users/etc./etc.')
driver.implicitly_wait(10)
driver.get(url)

# getting the prices
sleep(randint(8,10))
xp_prices = '//a[@class="booking-link"]/span[@class="price option-text"]'
prices = driver.find_elements_by_xpath(xp_prices)
prices_list = [price.text.replace('$','') for price in prices if price.text != '']
prices_list = list(map(int, prices_list))

enter image description here

Asked By: June Smith

||

Answers:

There are 2 problems here with locator XPath:

  1. The a element class name is not booking-link, but booking-link , with trailing space.
  2. Your locator matching duplicating irrelevant (invisible) elements.
    The following locator works:
"//div[@class='above-button']//a[contains(@class,'booking-link')]/span[@class='price option-text']"

So, the relevant code line could be:

xp_prices = "//div[@class='above-button']//a[contains(@class,'booking-link')]/span[@class='price option-text']"
Answered By: Prophet

To extract the prices from View Deal for the Best and Cheapest section within the website you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • From the Best section:

    driver.get("https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Best']//following::div[contains(@class, 'bottom-booking')]//a//div[contains(@class, 'price-text')]"))).text)
    
  • Console output:

    $807
    
  • From the Cheapest section:

    driver.get("https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Cheapest']//following::div[contains(@class, 'bottom-booking')]//a//div[contains(@class, 'price-text')]"))).text)
    
  • Console output:

    $410
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
Answered By: undetected Selenium