Beautiful Soup fails to acquire prices from amazon at times
Question:
In running a beautiful soup script, to acquire prices from amazon. I’ve stumbled across a problem where beautiful soup fails to acquire the prices at random often, in the form of empty lists in output.
def getAmazonPrice(productUrl):
elems = []
while elems == None or elems == []:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'} # to make the server think its a web browser and not a bot
res = requests.get(productUrl, headers=headers)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')
print(elems)
return elems[0].text.strip()
price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')
print('The price is ' + price)
Output:
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[<span class="a-size-medium a-color-price header-price">
$26.58
</span>]
The price is $26.58
I feel like the issue stems from amazon blocking my scraping tool.
How would you use Beautiful Soup to scrape amazon?
Answers:
Just save the res.text
to html file and you’ll see that you’re getting blocked by captcha.
In running a beautiful soup script, to acquire prices from amazon. I’ve stumbled across a problem where beautiful soup fails to acquire the prices at random often, in the form of empty lists in output.
def getAmazonPrice(productUrl):
elems = []
while elems == None or elems == []:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'} # to make the server think its a web browser and not a bot
res = requests.get(productUrl, headers=headers)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')
print(elems)
return elems[0].text.strip()
price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')
print('The price is ' + price)
Output:
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[<span class="a-size-medium a-color-price header-price">
$26.58
</span>]
The price is $26.58
I feel like the issue stems from amazon blocking my scraping tool.
How would you use Beautiful Soup to scrape amazon?
Just save the res.text
to html file and you’ll see that you’re getting blocked by captcha.