How do I properly use the find function from BeatifulSoup4 in python3?

Question

I’m following a youtube tutorial on how to scrape an amazon product-page. First I’m trying to get the product title. Later I want to get the amazon price and the secon-hand-price. For this I’m ustin requests and bs4. Here the code so far:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.de/Teenage-Engineering-Synthesizer-FM-Radio-AMOLED-Display/dp/B00CXSJUZS/ref=sr_1_1_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=op-1&qid=1594672884&sr=8-1-spons&psc=1&smid=A1GQGGPCGF8PV9&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFEMUZSUjhQMUM3NTkmZW5jcnlwdGVkSWQ9QTAwMzMwODkyQkpTNUJUUE9QUFVFJmVuY3J5cHRlZEFkSWQ9QTA4MzM4NDgxV1Y3UzVVN1lXTUZKJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='

headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}

page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.content,'html.parser')


title = soup.find('span',{'id' : "productTitle"})
print(title)

my title is None. So the find function doesn’t find the element with the id "productTitle". But checking the soup shows, that there is an element with that id..

So what’s wrong with my code?
I also tried:

title = soup.find(id = "productTitle")

Asked By: MuFFiiN

||

Source

Answer 1

Try this:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.de/Teenage-Engineering-Synthesizer-FM-Radio-AMOLED-Display/dp/B00CXSJUZS/ref=sr_1_1_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=op-1&qid=1594672884&sr=8-1-spons&psc=1&smid=A1GQGGPCGF8PV9&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFEMUZSUjhQMUM3NTkmZW5jcnlwdGVkSWQ9QTAwMzMwODkyQkpTNUJUUE9QUFVFJmVuY3J5cHRlZEFkSWQ9QTA4MzM4NDgxV1Y3UzVVN1lXTUZKJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='

headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}

page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.content,'lxml')


title = soup.find('span',{'id' : "productTitle"})
print(title.text.strip())

You do the right thing but have a "bad" parser. Read more about the differences between parsers here. I prefer lxml but also sometimes use html5lib. I also added

.text.strip()

to the print so only the title text is printed.

Note: you have to install lxml for python first!

Answered By: UWTD TV

How do I properly use the find function from BeatifulSoup4 in python3?

Question:

Answers: