How do I get href value instead of the text from a result set?
Question:
I’m using
print(d.contents)
My print loop prints the following ResultSets
:
[<a href="Property.aspx?pi=cd1a0b90-07aa-ec11-aa4c-246e960cbc4d" title="3 Bedroom House For Sale In Anthoupoli, Nicosia">3 Bedroom House For Sale In Anthoupoli, Nicosia</a>]
[<a href="Property.aspx?pi=c42dc379-5f67-e811-bb4e-a4badb3ceace" title="2 Storey Modern 3 Bedroom House For Sale In Dali Area">2 Storey Modern 3 Bedroom House For Sale In Dali Area</a>]
[<a href="Property.aspx?pi=6370763d-61ab-e811-b319-a4badb3ceacd" title="Very Nice And Spacious 3 Bedroom Detached House For Sale In Lakatamia">Very Nice And Spacious 3 Bedroom Detached House For Sale In Lakatamia</a>]
[<a href="Property.aspx?pi=7da50193-266b-e811-bb4e-a4badb3ceace" title="3 Bedroom Under Construction Detached 4 Houses For Sale In Tseri Area">3 Bedroom Under Construction Detached 4 Houses For Sale In Tseri Area</a>]
[<a href="Property.aspx?pi=96d0fb0a-89bd-ec11-aa4e-246e960cbc4d" title="3 Bedroom House For Sale In Agios Dometios, Nicosia">3 Bedroom House For Sale In Agios Dometios, Nicosia</a>]
[<a href="Property.aspx?pi=1881b78e-52d7-ec11-aa4e-246e960cbc4d" title="4 Bedroom House For Sale In Archangelos, Nicosia">4 Bedroom House For Sale In Archangelos, Nicosia</a>]
[<a href="Property.aspx?pi=eaa2b630-e685-ec11-aa4c-246e960cbc4d" title="In Excellent Location 3 Bedrooms House In Archangelos Nicosia">In Excellent Location 3 Bedrooms House In Archangelos Nicosia</a>]
[<a href="Property.aspx?pi=2ad40da2-a190-ec11-aa4c-246e960cbc4d" title="Incomplete residential development in Politiko, Nicosia">Incomplete residential development in Politiko, Nicosia</a>]
[<a href="Property.aspx?pi=a19ad42e-cd59-e911-8b16-a4badb3ceacd" title="3 Bedroom House For Sale In Spilia With Great View">3 Bedroom House For Sale In Spilia With Great View</a>]
How can I print only the values of href
attribute?
I noticed that using
print(d.text)
gives me only the titles, but I want the URLs instead.
Answers:
Instead of d.contents
that will always create a ResultSet
of elements children, select the one and only <a>
in your element directly and extract its href
:
d.a.get('href')
or
d.find('a').get('href')
In addition you could also select your elements more specific
for e in soup.select('#properties h3 a'):
print(e.get('href'))
Example
For your next question try to create an example like this, to make it easier for others to understand the situation and your issue.
from bs4 import BeautifulSoup
html = '''
<div id="properties">
<div class="item">
<div class="item-header clearfix">
<h3><a href="Property.aspx?pi=92700a36-fd11-ec11-aa4a-246e960cbc4d" title="3 Bedroom House For Sale In Akaki">3 Bedroom House For Sale In Akaki</a></h3>
</div>
</div>
<div class="item">
<div class="item-header clearfix">
<h3><a href="Property.aspx?pi=dc3f03fe-6140-ea11-a1df-a4badb3ceacd" title="Shop For Sale in Nicosia Center">Shop For Sale in Nicosia Center</a></h3>
</div>
</div>
<div class="item">
<div class="item-header clearfix">
<h3><a href="Property.aspx?pi=7f72737d-72e2-ec11-aa4e-246e960cbc4d" title="1 Bedroom Apartment For Sale In Strovolos, Nicosia">1 Bedroom Apartment For Sale In Strovolos, Nicosia</a></h3>
</div>
</div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for e in soup.select('#properties h3 a'):
print(e.get('href'))
Output
Property.aspx?pi=92700a36-fd11-ec11-aa4a-246e960cbc4d
Property.aspx?pi=dc3f03fe-6140-ea11-a1df-a4badb3ceacd
Property.aspx?pi=7f72737d-72e2-ec11-aa4e-246e960cbc4d
I’m using
print(d.contents)
My print loop prints the following ResultSets
:
[<a href="Property.aspx?pi=cd1a0b90-07aa-ec11-aa4c-246e960cbc4d" title="3 Bedroom House For Sale In Anthoupoli, Nicosia">3 Bedroom House For Sale In Anthoupoli, Nicosia</a>]
[<a href="Property.aspx?pi=c42dc379-5f67-e811-bb4e-a4badb3ceace" title="2 Storey Modern 3 Bedroom House For Sale In Dali Area">2 Storey Modern 3 Bedroom House For Sale In Dali Area</a>]
[<a href="Property.aspx?pi=6370763d-61ab-e811-b319-a4badb3ceacd" title="Very Nice And Spacious 3 Bedroom Detached House For Sale In Lakatamia">Very Nice And Spacious 3 Bedroom Detached House For Sale In Lakatamia</a>]
[<a href="Property.aspx?pi=7da50193-266b-e811-bb4e-a4badb3ceace" title="3 Bedroom Under Construction Detached 4 Houses For Sale In Tseri Area">3 Bedroom Under Construction Detached 4 Houses For Sale In Tseri Area</a>]
[<a href="Property.aspx?pi=96d0fb0a-89bd-ec11-aa4e-246e960cbc4d" title="3 Bedroom House For Sale In Agios Dometios, Nicosia">3 Bedroom House For Sale In Agios Dometios, Nicosia</a>]
[<a href="Property.aspx?pi=1881b78e-52d7-ec11-aa4e-246e960cbc4d" title="4 Bedroom House For Sale In Archangelos, Nicosia">4 Bedroom House For Sale In Archangelos, Nicosia</a>]
[<a href="Property.aspx?pi=eaa2b630-e685-ec11-aa4c-246e960cbc4d" title="In Excellent Location 3 Bedrooms House In Archangelos Nicosia">In Excellent Location 3 Bedrooms House In Archangelos Nicosia</a>]
[<a href="Property.aspx?pi=2ad40da2-a190-ec11-aa4c-246e960cbc4d" title="Incomplete residential development in Politiko, Nicosia">Incomplete residential development in Politiko, Nicosia</a>]
[<a href="Property.aspx?pi=a19ad42e-cd59-e911-8b16-a4badb3ceacd" title="3 Bedroom House For Sale In Spilia With Great View">3 Bedroom House For Sale In Spilia With Great View</a>]
How can I print only the values of href
attribute?
I noticed that using
print(d.text)
gives me only the titles, but I want the URLs instead.
Instead of d.contents
that will always create a ResultSet
of elements children, select the one and only <a>
in your element directly and extract its href
:
d.a.get('href')
or
d.find('a').get('href')
In addition you could also select your elements more specific
for e in soup.select('#properties h3 a'):
print(e.get('href'))
Example
For your next question try to create an example like this, to make it easier for others to understand the situation and your issue.
from bs4 import BeautifulSoup
html = '''
<div id="properties">
<div class="item">
<div class="item-header clearfix">
<h3><a href="Property.aspx?pi=92700a36-fd11-ec11-aa4a-246e960cbc4d" title="3 Bedroom House For Sale In Akaki">3 Bedroom House For Sale In Akaki</a></h3>
</div>
</div>
<div class="item">
<div class="item-header clearfix">
<h3><a href="Property.aspx?pi=dc3f03fe-6140-ea11-a1df-a4badb3ceacd" title="Shop For Sale in Nicosia Center">Shop For Sale in Nicosia Center</a></h3>
</div>
</div>
<div class="item">
<div class="item-header clearfix">
<h3><a href="Property.aspx?pi=7f72737d-72e2-ec11-aa4e-246e960cbc4d" title="1 Bedroom Apartment For Sale In Strovolos, Nicosia">1 Bedroom Apartment For Sale In Strovolos, Nicosia</a></h3>
</div>
</div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
for e in soup.select('#properties h3 a'):
print(e.get('href'))
Output
Property.aspx?pi=92700a36-fd11-ec11-aa4a-246e960cbc4d
Property.aspx?pi=dc3f03fe-6140-ea11-a1df-a4badb3ceacd
Property.aspx?pi=7f72737d-72e2-ec11-aa4e-246e960cbc4d