Xpath. How to select all text between two tags?
Question:
Here is the HTML source code
<div class="text">
<a name="dst100030"></a>
<pre id="p73" class="P">
<span class="blk">│Лабораторные методы исследования │</span>
</pre>
<pre id="p74" class="P">
<span class="blk">├────────────┬───────────────────────────┬─────────────────┬──────────────┤</span></pre>
<a name="dst100031"></a>
I need to get all text in between the two <a name="dst100030">
tags. Here’s what I tried:
response.xpath('//pre//text()[preceding-sibling::a[@name="dst100030"] and following-sibling::a[@name="dst100031"]]')
But it returns empty list. Where am I wrong?
Answers:
A solution to what you have asked using re:
Note: As others have mentioned in the comments this may not be the best solution – you are better to use a proper parser.
import re
source_code ='<div class="text"><a name="dst100030"></a><pre id="p73" class="P"><span class="blk">│Лабораторные методы исследования│</span></pre><pre id="p74" class="P"><span class="blk">├────────────┬───────────────────────────┬─────────────────┬──────────────┤</span></pre></a name="dst100031"></a>'
text = re.findall('<a name="dst100030">(.*)</a name="dst100031">', source_code)
print(text)
<a>
is a sibling of <pre>
, not the text(). You can use preceding::a
instead (and similarly for following
).
Here is the HTML source code
<div class="text">
<a name="dst100030"></a>
<pre id="p73" class="P">
<span class="blk">│Лабораторные методы исследования │</span>
</pre>
<pre id="p74" class="P">
<span class="blk">├────────────┬───────────────────────────┬─────────────────┬──────────────┤</span></pre>
<a name="dst100031"></a>
I need to get all text in between the two <a name="dst100030">
tags. Here’s what I tried:
response.xpath('//pre//text()[preceding-sibling::a[@name="dst100030"] and following-sibling::a[@name="dst100031"]]')
But it returns empty list. Where am I wrong?
A solution to what you have asked using re:
Note: As others have mentioned in the comments this may not be the best solution – you are better to use a proper parser.
import re
source_code ='<div class="text"><a name="dst100030"></a><pre id="p73" class="P"><span class="blk">│Лабораторные методы исследования│</span></pre><pre id="p74" class="P"><span class="blk">├────────────┬───────────────────────────┬─────────────────┬──────────────┤</span></pre></a name="dst100031"></a>'
text = re.findall('<a name="dst100030">(.*)</a name="dst100031">', source_code)
print(text)
<a>
is a sibling of <pre>
, not the text(). You can use preceding::a
instead (and similarly for following
).