Beautiful Soup doesn't find element
Question:
A website I am trying to scrape has this html tag (I believe its an A/B test which is why I have two BS4 searches going at the same time).
<h2 data-testid="price">39 777 kr</h2>
I am trying to scrape the text inside this h2 tag, but it doesn’t seem to work.
I have tried find_all
, select
and find
but to no avail.
This is the full implementation:
soup = BeautifulSoup(response.text, 'html.parser')
total_price = soup.body.find('span', attrs='u-t3')
total_price_alternative = soup.body.find('h2', attrs={'data-testid': 'price'})
if total_price is not None:
main_price_info = {
'title': 'Total price',
'value': total_price.text.replace(u'xa0', ' ')
}
elif total_price_alternative is not None:
main_price_info = {
'title': 'Total price',
'value': total_price_alternative.text.replace(u'xa0', ' ')
}
else:
main_price_info = {
'title': 'Total price',
'value': 'Could not find price'
}
URL to the site (It’s in Norwegian): https://www.finn.no/car/used/ad.html?finnkode=297865903
Answers:
The price is stored inside <script>
element, so beautifulsoup
doesn’t see it. You can use json.loads
to parse this data:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.finn.no/car/used/ad.html?finnkode=297865903'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
data = json.loads(soup.select_one('#horseshoe-config').text)
price = data['xandr']['feed']['pris']
print(price)
Prints:
39777
A website I am trying to scrape has this html tag (I believe its an A/B test which is why I have two BS4 searches going at the same time).
<h2 data-testid="price">39 777 kr</h2>
I am trying to scrape the text inside this h2 tag, but it doesn’t seem to work.
I have tried find_all
, select
and find
but to no avail.
This is the full implementation:
soup = BeautifulSoup(response.text, 'html.parser')
total_price = soup.body.find('span', attrs='u-t3')
total_price_alternative = soup.body.find('h2', attrs={'data-testid': 'price'})
if total_price is not None:
main_price_info = {
'title': 'Total price',
'value': total_price.text.replace(u'xa0', ' ')
}
elif total_price_alternative is not None:
main_price_info = {
'title': 'Total price',
'value': total_price_alternative.text.replace(u'xa0', ' ')
}
else:
main_price_info = {
'title': 'Total price',
'value': 'Could not find price'
}
URL to the site (It’s in Norwegian): https://www.finn.no/car/used/ad.html?finnkode=297865903
The price is stored inside <script>
element, so beautifulsoup
doesn’t see it. You can use json.loads
to parse this data:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.finn.no/car/used/ad.html?finnkode=297865903'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
data = json.loads(soup.select_one('#horseshoe-config').text)
price = data['xandr']['feed']['pris']
print(price)
Prints:
39777