How do you access the 101st page of an amazon category list
Question:
I would like to access all of the items in a given category inside amazon, but it seems that the category pages are generated via search. Bumping the page search parameter in the URL will only take you to the 100th page. Is there any way to get past that? Here’s a sample url for books
Answers:
The content is loaded dynamically using ajax XHR call.
Long story short:
- open browser dev tools
- open network tab
- click on the page link on amazon
- see XHR request is going to
http://www.amazon.com/mn/search/ajax/ref=sr_pg_3...
– this is what you should call in your Scrapy spider (returns JSON)
So, basically, you should just call this XHR request 100 times (or find out if you can get them all in one).
Useful links:
- Can scrapy be used to scrape dynamic content from websites that are using AJAX?
- Pagination using scrapy
Notes:
- amazon limits search results to 100 pages
- you can try amazon API
instead of scraping web-site directly. See
Amazon API library for Python?.
Hope that helps.
I would like to access all of the items in a given category inside amazon, but it seems that the category pages are generated via search. Bumping the page search parameter in the URL will only take you to the 100th page. Is there any way to get past that? Here’s a sample url for books
The content is loaded dynamically using ajax XHR call.
Long story short:
- open browser dev tools
- open network tab
- click on the page link on amazon
- see XHR request is going to
http://www.amazon.com/mn/search/ajax/ref=sr_pg_3...
– this is what you should call in your Scrapy spider (returns JSON)
So, basically, you should just call this XHR request 100 times (or find out if you can get them all in one).
Useful links:
- Can scrapy be used to scrape dynamic content from websites that are using AJAX?
- Pagination using scrapy
Notes:
- amazon limits search results to 100 pages
- you can try amazon API
instead of scraping web-site directly. See
Amazon API library for Python?.
Hope that helps.