How to locate next element after text found in b tag – Selenium Python

Question:

I’m trying to extract the text value following a b tag that contains specific text. I’m using Selenium web driver with Python3.

The HTML inspected for the value I’m trying to return (11,847) is here:

HTML of desired Xpath

This has an Xpath below (I’m not using this xpath directly to find the element as the table construction changes for different examples that I plan to iterate through):

/html/body/form[1]/div[2]/table[2]/tbody/tr[3]/td[2]/text()

As an example, when I print the below it returns Att: i.e. the element located by my search for the text ‘Att’ within the b tags.

att=driver.find_element("xpath",".//b[contains(text(), 'Att')]").text

print(att)

Is there a way I can return the value following <b>Att:</b> by searching for ‘Att:’ (or conversly, I’d also like to return the value following <b>Ref:</b>.

Thanks in advance.

Asked By: pippo_T

||

Answers:

You can use the find_element_by_xpath() method to locate the element that contains the text ‘Att:’ and then use the find_element_by_xpath() method again to locate the following text node. Here is an example of how you can do this:

att_element = driver.find_element_by_xpath("//b[contains(text(), 'Att:')]")
att_value = att_element.find_element_by_xpath('./following-sibling::text()').text
print(att_value)

This will locate the element that contains the text ‘Att:’, then locate the following text node, and return the text value of that node.

Similarly you can use the same xpath for ‘Ref:’ as well just change the text part to ‘Ref:’

ref_element = driver.find_element_by_xpath("//b[contains(text(), 'Ref:')]")
ref_value = ref_element.find_element_by_xpath('./following-sibling::text()').text
print(ref_value)

Note that this will only work if the text value you’re trying to extract is immediately following the element that contains ‘Att:’ or ‘Ref:’ in a text node.

The following xpath would result in an error:

/html/body/form[1]/div[2]/table[2]/tbody/tr[3]/td[2]/text()

as Selenium returns only WebElements but not objects.


Solution

The text 11,847 is within a text node which is the second decendent of the <td> node. So to print the text you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using XPATH and childNodes[n]:

    print(driver.execute_script('return arguments[0].childNodes[2].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='initial']//td[@align='right']")))).strip())
    
  • Using XPATH and splitlines():

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='initial']//td[@align='right']"))).get_attribute("innerHTML").splitlines()[2])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
Answered By: undetected Selenium

11,847 text content belongs to td node.
You can locate this td element by it’s child b text content.
Then you will be able to retrieve the entire text content of that td node.
It will contain Att: and extra spaces and the desired 11,847 string.
Now you will need to remove the Att: and extra spaces so only 11,847 will remain.
As following:

#get the entire text content
entire_text = driver.find_element(By.XPATH,"//td[.//b[contains(text(), 'Att')]]").text
#get the child node text content
child_text = driver.find_element(By.XPATH,"//b[contains(text(), 'Att')]").text
#remove child text content from entire text content
goal_text = entire_text.replace(child_text,'')
#trim white spaces
goal_text = goal_text.strip()
Answered By: Prophet
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.