Scraping Oreilly search results with Python returning empty results

Question:

I’m using Python to search O’Reilly’s search bar for certain strings but it’s returning empty results.

For example: Here, I’m trying to get O’Reilly’s search bar to give me list of books that it sells related to Science, by appending the string: "Science" to O’Reilly’s Search Bar address: "https://www.oreilly.com/search/?query=" and opening the resulting string: "https://www.oreilly.com/search/?query=Science" through Python’s requests library.

import requests

myurl = "https://www.oreilly.com/search/?query=Science"
page = requests.get(myurl).text

When I look for search results in the resulting html, it has no books in its results.
The book results should ideally come under the tag that goes something like the following:

<section class="Results–amUWr…

After investigating a bit further, I found that when I print pagethe html result has the following tag (data-search-results="false"):

<section class="Results–amUWr" data-search-results="false">

However, when I’m pasting the same string (https://www.oreilly.com/search/?query=Science) in a web browser, it’s giving me the above tag as follows:

<section class="Results–amUWr" data-search-results="true">

and the search results are appearing in the browser as well as the html result.

I’m unable to understand why opening the URL through Python is returning me a different result that what happens when I open the URL through a web browser like Chrome. Please help me out regarding this.

Thanks.

Asked By: Madhur Gupta

||

Answers:

The web browser runs a script that calls an API endpoint. You can see the requests in the web browser by going to the developer tools (CTRL+I or F12) and looking at the fetch/XHR requests. The API endpoint in this case is https://www.oreilly.com/api/v2/search/?query=Science and some other irrelevant parameters. If you query this URL, you will get JSON output without having to use beautifulsoup.

Answered By: Christian Pojoni

Always and first of all check the response to your request or / and your soup.

  • is the request successful?
  • are the expected elements (tags, strings, classes,…) included?

In this case some of them are not, e.g. the content is provided dynamically based on data from an extra POST request – While requests per se is using only the static contents and do not render / manipulate results like a browser would do, you will not be able to process successfully with BeautifulSoup – Cause you could not find_all() what is not available.

You may could simulate browser behavior with selenium and process rendered driver.page_source with BeautifulSoup for practicing purpose, but this is not necessary.

Simply use requests instead and iterate the pages and use the structured data from JSON response to get your expected data:

for i in range(0,10):
    url = f'https://www.oreilly.com/api/v2/search/?query=Science&formats=book&sort=relevance&page={i}'
    books.extend(requests.get(url, headers=headers).json()['results'])

Example

import requests
from bs4 import  BeautifulSoup

headers = {'user-agent': 'Mozilla/5.0'}
books = []

for i in range(0,10):
    url = f'https://www.oreilly.com/api/v2/search/?query=Science&formats=book&sort=relevance&page={i}'
    books.extend(requests.get(url, headers=headers).json()['results'])

books

Output

[{'id': 'https://www.safaribooksonline.com/api/v1/book/9781681987774/', 'archive_id': '9781681987774', 'ourn': 'urn:orm:book:9781681987774', 'isbn': '9781681987774', 'last_modified_time': '2022-09-06T21:33:31.184Z', 'issued': '2021-05-28T00:00:00Z', 'format': 'book', 'content_format': 'book', 'authors': ['Brent Eviston'], 'publishers': ['Rocky Nook'], 'academic_excluded': False, 'language': 'en', 'title': 'The Art and Science of Drawing', 'description': '<span><p><br/>Drawing is not a talent. It’s a skill anyone can learn. This is the philosophy of drawing instructor Brent Eviston based on his more than twenty years of teaching. He has tested numerous types of drawing instruction from centuries old classical techniques to contemporary practices and designed an approach that combines tried and true techniques with innovative methods of his own. Now, he shares his secrets with this book that provides the most accessible, streamlined, and effective methods for learning to draw.</p><p>Taking the reader through the entire process, beginning with the most basic skills to more advanced such as volumetric drawing, shading, and figure sketching, this book contains numerous projects and guidance on what and how to practice. It also features instructional images and diagrams as well as finished drawings that showcase Brent’s creative work.xa0With this book and a dedication to practice, anyone can learn to draw!<br/></p></span>', 'url': 'https://www.oreilly.com/api/v1/book/9781681987774/', 'web_url': '/library/view/the-art-and/9781681987774/', 'source': 'application/epub+zip', 'content_type': 'book', 'virtual_pages': 338, 'duration_seconds': -1, 'has_assessment': False, 'timestamp': '2022-09-06T21:47:43.102Z', 'average_rating': 5000, 'number_of_followers': 0, 'number_of_items': 0, 'number_of_reviews': 1, 'popularity': 949602920, 'report_score': 5000, 'cover_url': 'https://www.oreilly.com/library/cover/9781681987774/', 'date_added': '2021-05-29T06:16:10.080Z', 'topics': ['de5b3d7f-eca6-4d41-91cf-e49a0512653e'], 'topics_payload': [{'uuid': 'de5b3d7f-eca6-4d41-91cf-e49a0512653e', 'slug': 'science', 'name': 'Science', 'score': None}]}, {'id': 'https://www.safaribooksonline.com/api/v1/book/9781000486810/', 'archive_id': '9781000486810', 'ourn': 'urn:orm:book:9781000486810', 'isbn': '9781000486810', 'last_modified_time': '2022-08-15T09:09:04.765Z', 'issued': '2021-11-29T00:00:00Z', 'format': 'book', 'content_format': 'book', 'authors': ['Paul Swuste', 'Jop Groeneweg', 'Frank W. Guldenmund', 'Coen van Gulijk', 'Saul Lemkowitz', 'Yvette Oostendorp', 'Walter Zwaard'], 'publishers': ['Routledge'], 'academic_excluded': False, 'language': 'en', 'title': 'From Safety to Safety Science', 'description': '<span><p>How do accidents and disasters occur? How has knowledge of accident processes evolved? A significant improvement in safety has occurred during the past century, with the number of accidents falling spectacularly within industry, aviation and road traffic. This progress has been gradual in the context of a changing society. The improvements are partly due to a better understanding of the accident processes that ultimately lead to damage. This book shows how contemporary crises instigated the development of safety knowledge and how the safety sciences pieced their theories together by research, by experience and by taking ideas from other domains.</p><p><em>From Safety to Safety Science</em> details 150 years of knowledge development in the safety sciences. The authors have rigorously extracted the essence of safety knowledge development from more than 2,500 articles to provide a unique overview and insight into the background and usability of safety theories, as well as modelling how they developed and how they are used today. Extensive appendices and references provide an additional dimension to support further scholarly work in this field.</p><p>The book is divided into clear time periods to make it an accessible piece of science history that will be invaluable to both new and experienced safety researchers, to safety courses and education, and to learned practitioners.</p></span>', 'url': 'https://www.oreilly.com/api/v1/book/9781000486810/', 'web_url': '/library/view/from-safety-to/9781000486810/', 'source': 'application/epub+zip', 'content_type': 'book', 'virtual_pages': 726, 'duration_seconds': -1, 'has_assessment': False, 'timestamp': '2022-08-15T09:09:28.329Z', 'average_rating': 0, 'number_of_followers': 0, 'number_of_items': 0, 'number_of_reviews': 0, 'popularity': 751744009, 'report_score': 0, 'cover_url': 'https://www.oreilly.com/library/cover/9781000486810/', 'date_added': '2022-08-15T09:07:54.808Z', 'topics': ['de5b3d7f-eca6-4d41-91cf-e49a0512653e'], 'topics_payload': [{'uuid': 'de5b3d7f-eca6-4d41-91cf-e49a0512653e', 'slug': 'science', 'name': 'Science', 'score': None}]}, {'id': 'https://www.safaribooksonline.com/api/v1/book/9780760375686/', 'archive_id': '9780760375686', 'ourn': 'urn:orm:book:9780760375686', 'isbn': '9780760375686', 'last_modified_time': '2022-09-07T16:30:10.903Z', 'issued': '2022-07-19T00:00:00Z', 'format': 'book', 'content_format': 'book', 'authors': ['Liz Lee Heinecke'], 'publishers': ['Quarry Books'], 'academic_excluded': False, 'language': 'en', 'title': 'Sheet Pan Science', 'description': '<span><i>Sheet Pan Science</i> features 25 awesome, bubbling, colorful, fizzing, oozing science experiments that all fit on a standard sheet pan.<br/> xa0n</span>', 'url': 'https://www.oreilly.com/api/v1/book/9780760375686/', 'web_url': '/library/view/sheet-pan-science/9780760375686/', 'source': 'application/epub+zip', 'content_type': 'book', 'virtual_pages': 299, 'duration_seconds': -1, 'has_assessment': False, 'timestamp': '2022-09-07T16:54:11.398Z', 'average_rating': 0, 'number_of_followers': 0, 'number_of_items': 0, 'number_of_reviews': 0, 'popularity': 655387231, 'report_score': 0, 'cover_url': 'https://www.oreilly.com/library/cover/9780760375686/', 'date_added': '2022-07-12T16:45:53.980Z', 'topics': ['de5b3d7f-eca6-4d41-91cf-e49a0512653e'], 'topics_payload': [{'uuid': 'de5b3d7f-eca6-4d41-91cf-e49a0512653e', 'slug': 'science', 'name': 'Science', 'score': None}]}, {'id': 'https://www.safaribooksonline.com/api/v1/book/9780323884983/', 'archive_id': '9780323884983', 'ourn': 'urn:orm:book:9780323884983', 'isbn': '9780323884983', 'last_modified_time': '2022-09-07T22:48:35.379Z', 'issued': '2021-04-16T00:00:00Z', 'format': 'book', 'content_format': 'book', 'authors': ['Zhongzhi Shi'], 'publishers': ['Elsevier'], 'academic_excluded': False, 'language': 'en', 'title': 'Intelligence Science', 'description': '<span><p><i>Intelligence Science: Leading the Age of Intelligence</i> covers the emerging scientific research on the theory and technology of intelligence, bringing together disciplines such as neuroscience, cognitive science, and artificial intelligence to study the nature of intelligence, the functional simulation of intelligent behavior, and the development of new intelligent technologies. The book presents this complex, interdisciplinary area of study in an accessible volume, introducing foundational concepts and methods, and presenting the latest trends and developments. Chapters cover the Foundations of neurophysiology, Neural computing, Mind models, Perceptual intelligence, Language cognition, Learning, Memory, Thought, Intellectual development and cognitive structure, Emotion and affect, and more.xa0 </p><p>This volume synthesizes a very rich and complex area of research, with an aim of stimulating new lines of enquiry.</p><ul><li>Presents a complex, interdisciplinary area in an accessible way, including the latest trends and developments</li><li>Brings together disciplines such as neuroscience, cognitive science and artificial intelligence</li><li>Gives the latest methods and theories in the development of new intelligent technologies</li><li>Reflects upon the most important achievements in the study of natural and artificial intelligence</li><li>Contextualizes intelligence research within the history and progress of twenty-first century science</li></ul></span>', 'url': 'https://www.oreilly.com/api/v1/book/9780323884983/', 'web_url': '/library/view/intelligence-science/9780323884983/', 'source': 'application/epub+zip', 'content_type': 'book', 'virtual_pages': 995, 'duration_seconds': -1, 'has_assessment': False, 'timestamp': '2022-09-07T22:49:16.627Z', 'average_rating': 0, 'number_of_followers': 0, 'number_of_items': 0, 'number_of_reviews': 0, 'popularity': 713711368, 'report_score': 0, 'cover_url': 'https://www.oreilly.com/library/cover/9780323884983/', 'date_added': '2022-07-06T06:46:42.830Z', 'topics': ['de5b3d7f-eca6-4d41-91cf-e49a0512653e'], 'topics_payload': [{'uuid': 'de5b3d7f-eca6-4d41-91cf-e49a0512653e', 'slug': 'science', 'name': 'Science', 'score': None}]}, {'id': 'https://www.safaribooksonline.com/api/v1/book/9781119734147/', 'archive_id': '9781119734147', 'ourn': 'urn:orm:book:9781119734147', 'isbn': '9781119734147', 'last_modified_time': '2022-03-31T20:05:56.600Z', 'issued': '2022-03-29T00:00:00Z', 'format': 'book', 'content_format': 'book', 'authors': ['Sami W. Asmar'], 'publishers': ['Wiley'], 'academic_excluded': False, 'language': 'en', 'title': 'Radio Science Techniques for Deep Space Exploration', 'description': '<span><p><b>Explore the development and state-of-the-art in deep space exploration using radio science techniques</b></p><p>In <i>Radio Science Techniques for Deep Space Exploration</i>, accomplished NASA/JPL researcher and manager Sami Asmar delivers a multi-disciplinary exploration of the science, technology, engineering, mission operations, and signal processing relevant to deep space radio science. The book discusses basic principles before moving on to more advanced topics that include a wide variety of graphical illustrations and useful references to publications by experts in their respective fields.</p><p>Complete explanations of changes in the characteristics of electromagnetic waves and the instrumentation and technology used in scientific experiments are examined.</p><p><i>Radio Science Techniques for Deep Space Exploration</i> offers answers to the question of how to explore the solar system with radio links and better understand the interior structures, atmospheres, rings, and surfaces of other planets. The author also includes:</p><ul><li>Thorough introductions to radio science techniques and systems needed to investigate planetary atmospheres, rings, and surfaces</li><li>Comprehensive explorations of planetary gravity and interior structures, as well as relativistic and solar studies</li><li>Practical discussions of instrumentation, technologies, and future directions in radio science techniques</li></ul><p>Perfect for students and professors of physics, astronomy, planetary science, aerospace engineering, and communications engineering, <i>Radio Science Techniques for Deep Space Exploration</i> will also earn a place in the libraries of engineers and scientists in the aerospace industry.</p></span>', 'url': 'https://www.oreilly.com/api/v1/book/9781119734147/', 'web_url': '/library/view/radio-science-techniques/9781119734147/', 'source': 'application/epub+zip', 'content_type': 'book', 'virtual_pages': 653, 'duration_seconds': -1, 'has_assessment': False, 'timestamp': '2022-03-31T20:07:37.510Z', 'average_rating': 0, 'number_of_followers': 0, 'number_of_items': 0, 'number_of_reviews': 0, 'popularity': 663760100, 'report_score': 0, 'cover_url': 'https://www.oreilly.com/library/cover/9781119734147/', 'date_added': '2022-03-31T20:03:37.328Z', 'topics': ['de5b3d7f-eca6-4d41-91cf-e49a0512653e'], 'topics_payload': [{'uuid': 'de5b3d7f-eca6-4d41-91cf-e49a0512653e', 'slug': 'science', 'name': 'Science', 'score': None}]}]
Answered By: HedgeHog