Iterating through a group of elements and collecting the child elements

Question:

I have the following sample HTML:

<div class="person">
    <div class="title">
        <a href="http://www.url.com/name/">John Smith</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/name/">Phil Collins</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/name/">Tracy Beaker</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>

I am trying to iterate through the list to try and get the following results:

John Smith, SalesForce
Phil Collins, TaskForce
Trace Beaker, Accounting

I am using the following code:

persons = []
for person in driver.find_elements_by_class_name('person'):
    title = person.find_element_by_xpath('.//div[@class="title"]/a').text
    company = person.find_element_by_xpath('.//div[@class="company"]/a').text

    persons.append({'title': title, 'company': company})

However, the above code only iterates through the first person and not through all the people. Any help is appreciated.

Asked By: Aslan

||

Answers:

One of the corect ways to do it in Selenium would be:

person_divs = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "person")))
for x in person_divs:
    name = x.find_element(By.CLASS_NAME, "title")
    department = x.find_element(By.CLASS_NAME, "company")
    print(name.text + ',', department.text)

Do not forget to import

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Another way using BeautifulSoup would be:

from bs4 import BeautifulSoup

html = '''
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">John Smith</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">Phil Collins</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for x in soup.select('div.person'):
    p_name = x.select_one('div.title').text.strip()
    p_company = x.select_one('div.company').text.strip()
    print(p_name +  ',', p_company)

This would print out:

John Smith, SalesForce
Phil Collins, TaskForce
Tracy Beaker, Accounting

BeautifulSoup (bs4) actually has a great, easy to understand documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Answered By: platipus_on_fire

The below bs4 example shows that all the .person classes are iterating smoothly. But element selection for selenium, you are using element_by_xpath locator strategy whis is depricated. I think , it would be more robust way to use WebDriverWait .

from bs4 import BeautifulSoup

html='''
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">John Smith</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">SalesForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">Phil Collins</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">TaskForce</a>
    </div>
</div>
<div class="person">
    <div class="title">
        <a href="http://www.url.com/johnsmith/">Tracy Beaker</a>
    </div>
    <div class="company">
        <a href="http://www.url.com/company/">Accounting</a>
    </div>
</div>
'''
soup= BeautifulSoup(html,'lxml')

for person in soup.select('.person'):
    title = person.select_one('.title a').text
    print(title)

Output:

John Smith
Phil Collins
Tracy Beaker

Example for selenium:

persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@class="person"]'))):
    title = person.find_element(By.XPATH,'.//div[@class="title"]/a').text
    company = person.find_element(By.XPATH,'.//div[@class="company"]/a').text

    persons.append({'title': title, 'company': company})
print(persons)


#imports

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Answered By: F.Hoque

As you are able to iterate through the first person details that implies your logic is perfect but to consider all the persons you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:

persons = []
for person in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASSNAME, "person")))
    title = person.find_element_by_xpath('.//div[@class="title"]/a').text
    company = person.find_element_by_xpath('.//div[@class="company"]/a').text
    persons.append({'title': title, 'company': company})

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Answered By: undetected Selenium