Is there a way to scrape Amazon Product Listing page using Python?

Question:

I’m trying to scrape product listing pages that display the vendors and prices of particular products, but urllib.urlopen isn’t working–it will work on all other pages on Amazon, but I’m kind of wondering if Amazon’s bots prevent scraping on product listing pages. Can anyone verify this? Using Chrome I can still view page source…

Here’s an example of a product listing page I would want to scrape: http://www.amazon.com/gp/offer-listing/B007E84H96/ref=dp_olp_new?ie=UTF8&condition=new

Answers:

Have you heard of BeautifulSoup? You might get some mileage out of that…

http://www.crummy.com/software/BeautifulSoup/


More details: BeautifulSoup Grab Visible Webpage Text

Answered By: BenDundee

Trying curl -I on that URL returns MethodNotAllowed:

$ curl -I 'http://www.amazon.com/gp/offer-listing/B007E84H96/ref=dp_olp_new?ie=UTF8&condition=new' 
HTTP/1.1 405 MethodNotAllowed
Date: Wed, 13 Feb 2013 16:41:08 GMT
Server: Server
x-amz-id-1: 1WKZG9N0SE87E3KFG6YV
allow: POST, GET
x-amz-id-2: Apluv2QBzzrmXlRWjlClRGsQQ1TbwsxObe2hxfdrGhO/OQziI/aIT3vkVjCPn+qz
Vary: Accept-Encoding,User-Agent
Content-Type: text/html; charset=ISO-8859-1

and adding a User-Agent string with the -A switch didn’t effect that return value.

You might experiment with different http headers to see if you can find something that passess. But it’s pretty obvious that Amazon wouldn’t want you to screen scrape prices
from their product pages. And a little googling brings up this page:

http://www.distil.it/amazon-cracks-down-on-price-scraping/#.URvBFo4ry0s

With no fanfare or warning, Amazon in June began enforcing a
long-standing policy prohibiting screen-scraping tools from harvesting
listing information directly from its marketplace, a favorite tool for
providers of repricing services for merchants, according to a
third-party developer.

Note also that Amazon has an API for their affiliates — there are some related questions about using that API from python in the "Related" question links on the right column.

Answered By: Steven D. Majewski
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.