Xpath. How to select all text between two tags?

Question:

Here is the HTML source code

<div class="text">
 <a name="dst100030"></a>
 <pre id="p73" class="P">
 <span class="blk">│Лабораторные методы исследования                                         │</span>
 </pre>
 <pre id="p74" class="P">
 <span class="blk">├────────────┬───────────────────────────┬─────────────────┬──────────────┤</span></pre>
 <a name="dst100031"></a>

I need to get all text in between the two <a name="dst100030"> tags. Here’s what I tried:

response.xpath('//pre//text()[preceding-sibling::a[@name="dst100030"] and following-sibling::a[@name="dst100031"]]')

But it returns empty list. Where am I wrong?

Asked By: Billy Jhon

||

Answers:

A solution to what you have asked using re:

Note: As others have mentioned in the comments this may not be the best solution – you are better to use a proper parser.

import re

source_code ='<div class="text"><a name="dst100030"></a><pre id="p73" class="P"><span class="blk">│Лабораторные методы исследования│</span></pre><pre id="p74" class="P"><span class="blk">├────────────┬───────────────────────────┬─────────────────┬──────────────┤</span></pre></a name="dst100031"></a>'
text = re.findall('<a name="dst100030">(.*)</a name="dst100031">', source_code)
print(text)
Answered By: Chris

<a> is a sibling of <pre>, not the text(). You can use preceding::a instead (and similarly for following).

Answered By: choroba
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.