Web scraping a p tag without a class using Bs4 and Selenium

Question:

I’m trying to web scrape this ->

enter image description here

The HTML has a div tag with a class. in this div tag there is another div tag and there is another p tag with no class. My goal is to specifically get that lone p tag without the class and get the text data from it.

So far this is my code ->

I did not include some imports and other parts of my code.

html = driver.page_source
time.sleep(.1)
soup = bs.BeautifulSoup(html, 'lxml')
time.sleep(.1)


Class_Details = soup.find_all("div", {"class":"row-fluid data_row primary-row class-info class-not-checked"})

for class_detail in Class_Details:
Class_status = class_detail.find_all("div", {"class":"statusColumn"}) 
Status = Class_status[0].text

class_date = class_detail.find_all("p",{"class":"hide-above-small beforeCollapseShow"})
class_time = class_date[0].text 

The 4 lines above can be ignored they work and accomplish their tasks, the lines below however do not and is what I am asking.

cla = class_detail.find_all("p",{"class":"timeColumn"})
print(cla)

The Output of print(cla) is 
[]
[]
[]
[]
[]
[]
[]

The good thing is that there are 7 empty lists which do coincide with the websites so it definitely is counting/ sensing the part I am scraping however I need the output to be text.

I hope I have been clear in my question and thank you for your time.

Asked By: Ivan Pupo

||

Answers:

The reason your output is not printing is because you are trying to print an element, not element text. You should change your code to the following:

cla = class_detail.find_all("p",{"class":"timeColumn"})
for item in cla:
    print(item.text)

I know you are using BeautifulSoup, but I will also provide a solution using Selenium / XPath in case you do not find a BS implementation to your liking:

elements_list = driver.find_elements_by_xpath("//div[@class='timeColumn'/p]")

for element in elements_list:
    print(element.text)
Answered By: CEH

To get p tag without class use a CSS-selector for p combined with the negation pseudo-class :not().

Here, the CSS-selector could be .timeColumn p:not([class]):

# select_one to get first one
p_no_class = class_detail.select_one(".timeColumn p:not([class])").text
print(p_no_class)

# select to get all
all_p_no_class = class_detail.select(".timeColumn p:not([class])")
for p in all_p_no_class:
    print(p.text)

See also CSS selector for not having classes.

Answered By: Sers

The desired element is a JavaScript enabled element so to extract the text 7:45am-10:50am the desired element you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using XPATH:

    print(WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "//div[@class='timeColumn']/div[contains(@id, 'days_data')]/p/a[@class='popover-bottom' and text()='F']//following::p[1]"))).text)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
Answered By: undetected Selenium