Scraping Data frim AirBNB using Sellenium

Question:

Hi guys i am trying to scrape some data from airbnb in order to create a mini data analysis project for my portfolio.
I tried several tutorials with BeautifulSoup but none of them is working today, even if I use the very same link that they are using in the tutorials.

Due to this I turned to Selenium, I achieved to enter the side and I am trying to extract the names for in the first stage. Then I would like to extract all the information (price, reviews, rating, anemities etc.)

My code is the following but I am getting an empty list.
Can anyone help me how can i get the name of the appt ?

from selenium import webdriver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
import pandas as pd
from selenium.webdriver.common.by import By
website = 'https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(website)
titles = driver.find_elements("class name", "n1v28t5c s1cjsi4j dir dir-ltr")

Thanks.

Asked By: Lefteris Kyprianou

||

Answers:

To extract the names of the properties you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id^='title']")))])
    
  • Using XPATH:

    driver.get('https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[starts-with(@id, 'title') and text()]")))])
    
  • Console Output:

    ['Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Apartment in Thessaloniki', 'Loft in Thessaloniki', 'Flat in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Apartment in Thessaloniki', 'Flat in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Thessaloniki', 'Apartment in Thessaloniki', 'Flat in Thessaloniki', 'Flat in Thessaloniki', 'Flat in Thessaloniki', 'Apartment in Agios Pavlos']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
Answered By: undetected Selenium
driver.find_elements("class name", "n1v28t5c s1cjsi4j dir dir-ltr")

Will return 0 elements. By.CLASS_NAME can only find elements based on one class

("n1v28t5c s1cjsi4j dir dir-ltr" is actually 4 separate classes of the element you’re trying to locate). You can locate elements with multiple classes using, for example, XPATH selectors.

driver.find_elements(By.XPATH, '//div[@class="n1v28t5c s1cjsi4j dir dir-ltr"]')

This will find all the 20 elements in the page. I strongly encourage you to learn more about XPATH as it’s pretty simple to understand and very powerful

Answered By: Maciej Miecznik

Selenium with bs4 working fine without any issues and getting the right data. Just run the code.

Example:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
import pandas as pd
import time

url = 'https://www.airbnb.com/s/Thessaloniki--Greece/homes?tab_id=home_tab&flexible_trip_lengths%5B%5D=one_week&refinement_paths%5B%5D=%2Fhomes&place_id=ChIJ7eAoFPQ4qBQRqXTVuBXnugk&query=Thessaloniki%2C%20Greece&date_picker_type=calendar&search_type=unknown'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)

driver.maximize_window()
time.sleep(5)

soup=BeautifulSoup(driver.page_source, 'lxml')
for card in soup.select('div[class="c4mnd7m dir dir-ltr"]'):
    title = card.select_one('div[class="t1jojoys dir dir-ltr"]').text
    price = card.select_one('span[class="a8jt5op dir dir-ltr"]').text
    link = 'https://www.airbnb.com' + card.select_one('a[class="ln2bl2p dir dir-ltr"]').get('href')
    print(title, price)

Output:

Condo in Thessaloniki $50 per night
Apartment in Thessaloniki $38 per night
Condo in Thessaloniki $80 per night
Apartment in Thessaloniki $66 per night
Condo in Thessaloniki $23 per night
Apartment in Thessaloniki $74 per night
Condo in Thessaloniki $37 per night
Apartment in Thessaloniki $45 per night
Apartment in Thessaloniki $39 per night
Condo in Thessaloniki $27 per night
Apartment in Thessaloniki $28 per night
Condo in Thessaloniki $43 per night
Apartment in Thessaloniki $94 per night
Apartment in Thessaloniki $24 per night
Condo in Thessaloniki $86 per night
Loft in Thessaloniki $23 per night
Apartment in ThessalonĂ­ki $45 per night
Apartment in Thessaloniki $44 per night
Condo in Thessaloniki $50 per night
Condo in Thessaloniki $51 per night
Answered By: F.Hoque