Can't get texts out of a few dd tags that started after a certain dt tag
Question:
I’m trying to get text out of dd
tags located between the two dt
tags. I’m interested in the text within dd
tags that started after dt
tag, which contains Bransje
, until the next dt
tag.
The next dt
tag contains Stillingsfunksjon
, which may not always be the case. Given that the next dt
tag may contain anything.
from bs4 import BeautifulSoup
html = """
<section class="panel">
<dl class="definition-list definition-list--inline">
<dt>Sektor</dt>
<dd>Privat</dd>
<dt>Sted</dt>
<dd>Bratsbergveien 5, 7037 Trondheim</dd>
<dt>Bransje</dt>
<dd>Industri og produksjon,</dd>
<dd>Maritim og offshore,</dd>
<dd>Olje og gass</dd>
<dt>Stillingsfunksjon</dt>
<dd>Ingeniør</dd>
</dl>
</section>
"""
soup = BeautifulSoup(html,"lxml")
for i in soup.select("dt:-soup-contains('Bransje') ~ dd"):
print(i.text)
Current output:
Industri og produksjon,
Maritim og offshore,
Olje og gass
Ingeniør
Expected output:
Industri og produksjon,
Maritim og offshore,
Olje og gass
Answers:
One way to get there:
for i in soup.select("dt:-soup-contains('Bransje') ~ *"):
if i.name=="dt":
break
else:
print(i.text)
Output should be your expected output.
I’m trying to get text out of dd
tags located between the two dt
tags. I’m interested in the text within dd
tags that started after dt
tag, which contains Bransje
, until the next dt
tag.
The next dt
tag contains Stillingsfunksjon
, which may not always be the case. Given that the next dt
tag may contain anything.
from bs4 import BeautifulSoup
html = """
<section class="panel">
<dl class="definition-list definition-list--inline">
<dt>Sektor</dt>
<dd>Privat</dd>
<dt>Sted</dt>
<dd>Bratsbergveien 5, 7037 Trondheim</dd>
<dt>Bransje</dt>
<dd>Industri og produksjon,</dd>
<dd>Maritim og offshore,</dd>
<dd>Olje og gass</dd>
<dt>Stillingsfunksjon</dt>
<dd>Ingeniør</dd>
</dl>
</section>
"""
soup = BeautifulSoup(html,"lxml")
for i in soup.select("dt:-soup-contains('Bransje') ~ dd"):
print(i.text)
Current output:
Industri og produksjon,
Maritim og offshore,
Olje og gass
Ingeniør
Expected output:
Industri og produksjon,
Maritim og offshore,
Olje og gass
One way to get there:
for i in soup.select("dt:-soup-contains('Bransje') ~ *"):
if i.name=="dt":
break
else:
print(i.text)
Output should be your expected output.