Get text from an element with <br> on its composition, using Python Selenium

Question

I’m pulling contact information (text) from a website and I can currently pull all the class data, using the following XPath syntax:

//*[@id="nomapdata"]/div/div/div/div[2]/div[1]

Using this XPath expression for the element, I get the following text as the result:

Name
Title
Company Website
Phone Number

I want to pull each of these elements individually, but the problem is that, the data is separated by <br> </br>, and I haven’t had success on isolating each element.

Below is an example of the HTML structure:

<div class="col-sm-d">
"
                  Name"
<br>
"
                              Title"
<br>
a href="www.website.com" target="_blank">http://www.website.com</a>
<br>
"

Phone: (555) 555-5555"
<br>

The only element I am able to isolate is the website.

How can I isolate each data in this scenario?

Asked By: ls101

||

Source

Answer 1

Try to get the list of text nodes as

driver.find_element_by_xpath('//*[@id="nomapdata"]/div/div/div/div[2]/div[1]').text.split("n")

If there are more text nodes after the phone number which you don’t want to use:

driver.find_element_by_xpath('//*[@id="nomapdata"]/div/div/div/div[2]/div[1]').text.split("n")[:4]

Answered By: Andersson

Answer 2

You can use the same locator but get the innerHTML instead of .text. This will get you all the HTML between the open and close <DIV> tags. Then you can split the resulting string by <br> and you will have all the desired pieces. From your sample HTML, it looks like you will probably want to strip() each piece to remove spaces and you will have to process/parse the link portion however you need.

s = driver.find_element_by_xpath("//*[@id='nomapdata']/div/div/div/div[2]/div[1]").get_attribute("innerHTML")
data = [item.strip() for item in s.split("<br>")]

data will now be an array of strings, e.g.

['Name', 'Title', '<a href="www.website.com" target="_blank">http://www.website.com</a>', 'Phone: (555) 555-5555']

You can then process whatever else you want/need to.

Answered By: JeffC

Answer 3

First, get the elements:

var elements = _webDriver.FindElements(By.XPath(@"//*[@id='nomapdata']/div/div/div/div[2]/div[1]"));

Second;

        foreach (var element in elements)
        {
            var temp = element.Split('n');

            YourClass yourClass = new YourClass
            {
                Name = temp[0],
                Title = temp[1],
                CompanyWebsite = temp[2],
                PhoneNumber = temp[3],

            };

            yourList.Add(yourClass);
        }

Answered By: Caner G.

Get text from an element with <br> on its composition, using Python Selenium

Question:

Answers: