Iterating through a group of elements and collecting the child elements
Question:
I have the following sample HTML:
<div class="person">
<div class="title">
<a href="http://www.url.com/name/">John Smith</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/name/">Phil Collins</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/name/">Tracy Beaker</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
I am trying to iterate through the list to try and get the following results:
John Smith, SalesForce
Phil Collins, TaskForce
Trace Beaker, Accounting
I am using the following code:
persons = []
for person in driver.find_elements_by_class_name('person'):
title = person.find_element_by_xpath('.//div[@class="title"]/a').text
company = person.find_element_by_xpath('.//div[@class="company"]/a').text
persons.append({'title': title, 'company': company})
However, the above code only iterates through the first person and not through all the people. Any help is appreciated.
Answers:
One of the corect ways to do it in Selenium would be:
person_divs = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "person")))
for x in person_divs:
name = x.find_element(By.CLASS_NAME, "title")
department = x.find_element(By.CLASS_NAME, "company")
print(name.text + ',', department.text)
Do not forget to import
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Another way using BeautifulSoup would be:
from bs4 import BeautifulSoup
html = '''
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">John Smith</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">Phil Collins</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for x in soup.select('div.person'):
p_name = x.select_one('div.title').text.strip()
p_company = x.select_one('div.company').text.strip()
print(p_name + ',', p_company)
This would print out:
John Smith, SalesForce
Phil Collins, TaskForce
Tracy Beaker, Accounting
BeautifulSoup (bs4) actually has a great, easy to understand documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
The below bs4 example shows that all the .person
classes are iterating smoothly. But element selection for selenium, you are using element_by_xpath
locator strategy whis is depricated. I think , it would be more robust way to use WebDriverWait
.
from bs4 import BeautifulSoup
html='''
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">John Smith</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">Phil Collins</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
'''
soup= BeautifulSoup(html,'lxml')
for person in soup.select('.person'):
title = person.select_one('.title a').text
print(title)
Output:
John Smith
Phil Collins
Tracy Beaker
Example for selenium:
persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@class="person"]'))):
title = person.find_element(By.XPATH,'.//div[@class="title"]/a').text
company = person.find_element(By.XPATH,'.//div[@class="company"]/a').text
persons.append({'title': title, 'company': company})
print(persons)
#imports
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
As you are able to iterate through the first person details that implies your logic is perfect but to consider all the persons you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:
persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASSNAME, "person")))
title = person.find_element_by_xpath('.//div[@class="title"]/a').text
company = person.find_element_by_xpath('.//div[@class="company"]/a').text
persons.append({'title': title, 'company': company})
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I have the following sample HTML:
<div class="person">
<div class="title">
<a href="http://www.url.com/name/">John Smith</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/name/">Phil Collins</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/name/">Tracy Beaker</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
I am trying to iterate through the list to try and get the following results:
John Smith, SalesForce
Phil Collins, TaskForce
Trace Beaker, Accounting
I am using the following code:
persons = []
for person in driver.find_elements_by_class_name('person'):
title = person.find_element_by_xpath('.//div[@class="title"]/a').text
company = person.find_element_by_xpath('.//div[@class="company"]/a').text
persons.append({'title': title, 'company': company})
However, the above code only iterates through the first person and not through all the people. Any help is appreciated.
One of the corect ways to do it in Selenium would be:
person_divs = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "person")))
for x in person_divs:
name = x.find_element(By.CLASS_NAME, "title")
department = x.find_element(By.CLASS_NAME, "company")
print(name.text + ',', department.text)
Do not forget to import
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Another way using BeautifulSoup would be:
from bs4 import BeautifulSoup
html = '''
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">John Smith</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">Phil Collins</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for x in soup.select('div.person'):
p_name = x.select_one('div.title').text.strip()
p_company = x.select_one('div.company').text.strip()
print(p_name + ',', p_company)
This would print out:
John Smith, SalesForce
Phil Collins, TaskForce
Tracy Beaker, Accounting
BeautifulSoup (bs4) actually has a great, easy to understand documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
The below bs4 example shows that all the .person
classes are iterating smoothly. But element selection for selenium, you are using element_by_xpath
locator strategy whis is depricated. I think , it would be more robust way to use WebDriverWait
.
from bs4 import BeautifulSoup
html='''
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">John Smith</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">SalesForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">Phil Collins</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">TaskForce</a>
</div>
</div>
<div class="person">
<div class="title">
<a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
</div>
<div class="company">
<a href="http://www.url.com/company/">Accounting</a>
</div>
</div>
'''
soup= BeautifulSoup(html,'lxml')
for person in soup.select('.person'):
title = person.select_one('.title a').text
print(title)
Output:
John Smith
Phil Collins
Tracy Beaker
Example for selenium:
persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@class="person"]'))):
title = person.find_element(By.XPATH,'.//div[@class="title"]/a').text
company = person.find_element(By.XPATH,'.//div[@class="company"]/a').text
persons.append({'title': title, 'company': company})
print(persons)
#imports
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
As you are able to iterate through the first person details that implies your logic is perfect but to consider all the persons you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:
persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASSNAME, "person")))
title = person.find_element_by_xpath('.//div[@class="title"]/a').text
company = person.find_element_by_xpath('.//div[@class="company"]/a').text
persons.append({'title': title, 'company': company})
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC