Both selenium and bs4 cannot find div in page

Question

I am trying to scrape a Craigslist results page and neither bs4 or selenium can find the elements in the page even though I can see them on inspection using dev tools. The results are in list items with class cl-search-result, but it seems the soup returned has none of the results.

This is my script so far. It looks like even the soup that is returned is not the same as the html I see when I inspect with dev tools. I am expecting this script to return 42 items, which is the number of search results.

Here is the script:

import time
import datetime
from collections import namedtuple
import selenium.webdriver as webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.support.ui import Select
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import ElementNotInteractableException
from bs4 import BeautifulSoup
import pandas as pd
import os

user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0'
firefox_driver_path = os.path.join(os.getcwd(), 'geckodriver.exe')
firefox_service = Service(firefox_driver_path)
firefox_option = Options()
firefox_option.set_preference('general.useragent.override', user_agent)
browser = webdriver.Firefox(service=firefox_service, options=firefox_option)
browser.implicitly_wait(7)


url = 'https://baltimore.craigslist.org/search/sss#search=1~list~0~0'
browser.get(url)

soup = BeautifulSoup(browser.page_source, 'html.parser') 
print(soup)
posts_html= soup.find_all('li', {'class': 'cl-search-result'})
   
print('Collected {0} listings'.format(len(posts_html)))

Asked By: Tendekai Muchenje

||

Source

Answer 1

The following code worked for me. It printed: Collected 120 listings

url = 'https://baltimore.craigslist.org/search/sss#search=1~list~0~0'
browser = webdriver.Chrome()
browser.get(url)
sleep(3)

soup = BeautifulSoup(browser.page_source, 'html.parser') 
posts_html= soup.find_all('li', {'class': 'cl-search-result'})

print('Collected {0} listings'.format(len(posts_html)))

Edit 1: The `get` Method’s Wait Flaw

As per the selenium documentation, the webdriver get method "…will wait until the page has fully loaded (that is, the ‘onload’ event has fired)"…¹, but "…if your page uses a lot of AJAX on load then WebDriver may not know when it has completely loaded"¹. Because of this it’s generally recommend to either use time.sleep() or the WebDriverWait class to give enough time for all of the asynchronous requests to be completed.

Answered By: Übermensch

Both selenium and bs4 cannot find div in page

Question:

Answers:

Edit 1: The `get` Method’s Wait Flaw

Both selenium and bs4 cannot find div in page

Question:

Answers:

Edit 1: The get Method’s Wait Flaw

Edit 1: The `get` Method’s Wait Flaw