beautiful soup just get the value inside the tag
Question:
The following command:
volume = soup.findAll("span", {"id": "volume"})[0]
gives:
<span class="gr_text1" id="volume">16,103.3</span>
when I issue a print(volume).
How do I get just the number?
Answers:
Extract the string from the element:
volume = soup.findAll("span", {"id": "volume"})[0].string
Using css selector:
>>> soup.select('span#volume')[0].text
u'16,103.3'
Just to add , I also found the .string
dosn’t do well when there is <br>
in the text.
EG:
<div class = "Lines">
<span> First Line <br> Second Line <br> Third Line </span>
</div>
If we do a soup.find("div",attrs={"class":"Lines}).span.string
we get a None
But a soup.find("div",attrs={"class":"Lines}).span.text
we get
First Line
Second Line
Third Line
I think the .string
gives a NavigatableString
object and .text
gives a unicode object.
There is a function for getting the value of the tag : tag.contents[0]
Try this :
volumes = soup('span')
for volume in volumes:
print(volume.contents[0])
The following command:
volume = soup.findAll("span", {"id": "volume"})[0]
gives:
<span class="gr_text1" id="volume">16,103.3</span>
when I issue a print(volume).
How do I get just the number?
Extract the string from the element:
volume = soup.findAll("span", {"id": "volume"})[0].string
Using css selector:
>>> soup.select('span#volume')[0].text
u'16,103.3'
Just to add , I also found the .string
dosn’t do well when there is <br>
in the text.
EG:
<div class = "Lines">
<span> First Line <br> Second Line <br> Third Line </span>
</div>
If we do a soup.find("div",attrs={"class":"Lines}).span.string
we get a None
But a soup.find("div",attrs={"class":"Lines}).span.text
we get
First Line Second Line Third Line
I think the .string
gives a NavigatableString
object and .text
gives a unicode object.
There is a function for getting the value of the tag : tag.contents[0]
Try this :
volumes = soup('span')
for volume in volumes:
print(volume.contents[0])