Get multiple elements by tag with Python and Selenium
Question:
My code goes into a website, and scrapes rows of information (title and time).
However there is one tag (‘p’) that I am not sure how to get using ‘get element by’.
On the website, it is the information under each title.
Here is my code so far
import time
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
driver = webdriver.Chrome()
driver.get('https://www.nutritioncare.org/ASPEN21Schedule/#tab03_19')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
eachRow = driver.find_elements_by_class_name('timeline__item')
time.sleep(1)
for item in eachRow:
time.sleep(1)
title=item.find_element_by_class_name('timeline__item-title')
tim=item.find_element_by_class_name('timeline__item-time')
tex=item.find_element_by_tag_name('p') # this is the part i dont know how to scrape
print(title.text,tim.text,tex.text)
Answers:
Maybe try using different find_elements_by_class… I don’t use Python that much, but try this unless you already have.
Since the webpage has several p
tags, it would be better to use the .find_elements_by_class()
method. Replace the print
call in the code with the following:
print(title.text,tim.text)
for t in tex:
if t.text == '':
continue
print(t.text)
I checked the page and there are several p tags, I suggest to use find_elements_by_tag_name instead of find_element_by_tag_name
(to get all the p tags including the p tag that you want) and iterate over all the p tags elements and then join the text content and do strip on it.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import requests
driver = webdriver.Chrome()
driver.get('https://www.nutritioncare.org/ASPEN21Schedule/#tab03_19')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
eachRow = driver.find_elements_by_class_name('timeline__item')
time.sleep(1)
for item in eachRow:
time.sleep(1)
title=item.find_element_by_class_name('timeline__item-title')
tim=item.find_element_by_class_name('timeline__item-time')
tex=item.find_elements_by_tag_name('p')
text = " ".join([i.text for i in tex]).strip()
print(title.text,tim.text, text)
My code goes into a website, and scrapes rows of information (title and time).
However there is one tag (‘p’) that I am not sure how to get using ‘get element by’.
On the website, it is the information under each title.
Here is my code so far
import time
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
driver = webdriver.Chrome()
driver.get('https://www.nutritioncare.org/ASPEN21Schedule/#tab03_19')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
eachRow = driver.find_elements_by_class_name('timeline__item')
time.sleep(1)
for item in eachRow:
time.sleep(1)
title=item.find_element_by_class_name('timeline__item-title')
tim=item.find_element_by_class_name('timeline__item-time')
tex=item.find_element_by_tag_name('p') # this is the part i dont know how to scrape
print(title.text,tim.text,tex.text)
Maybe try using different find_elements_by_class… I don’t use Python that much, but try this unless you already have.
Since the webpage has several p
tags, it would be better to use the .find_elements_by_class()
method. Replace the print
call in the code with the following:
print(title.text,tim.text)
for t in tex:
if t.text == '':
continue
print(t.text)
I checked the page and there are several p tags, I suggest to use find_elements_by_tag_name instead of find_element_by_tag_name
(to get all the p tags including the p tag that you want) and iterate over all the p tags elements and then join the text content and do strip on it.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import requests
driver = webdriver.Chrome()
driver.get('https://www.nutritioncare.org/ASPEN21Schedule/#tab03_19')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
eachRow = driver.find_elements_by_class_name('timeline__item')
time.sleep(1)
for item in eachRow:
time.sleep(1)
title=item.find_element_by_class_name('timeline__item-title')
tim=item.find_element_by_class_name('timeline__item-time')
tex=item.find_elements_by_tag_name('p')
text = " ".join([i.text for i in tex]).strip()
print(title.text,tim.text, text)