How to wait a page to load before getting data with requests.get in python and without using api

Question:

I am using Python and requests library to do web-scraping. I’ve got a problem with the loading of a page, I would like to make the requests.get() wait before getting the result.

I saw some people with the same "problem" they resolved using Selenium, but I don’t want to use another API. I am wondering if it’s possible using only urllib, urllib2 or requests.

I have tried to put time.sleep() in the get method, it didn’t work.
It seems that I need to find where the website get the data before showing it but I can’t find it.

import requests

def search():
        url= 'https://academic.microsoft.com/search?q=machine%20learning'
        mySession = requests.Session()
        response = mySession.get(url)
        myResponse = response.text

The response is the html code of the loading page (you can see it if you go to the link in the code) with the loading blocks but I need to get the results of the research.

Asked By: Vernon Sullivan

||

Answers:

requests does not load elements that are supposed to be loaded dynamically via Ajax requests. See this definition from w3schools.com.

Read data from a web server – after a web page has loaded

The only thing requests do is to download the HTML content, but it does not interpret the javascript code inside the web page with the Ajax requests instructions. So it does not load elements that are normally loaded via Ajax in a web browser (or using Selenium).

Answered By: hayj

This site is making another requests and using javascript to render it. You cannot execute javascript with requests. That’s why some people use Selenium.

https://academic.microsoft.com/search?q=machine%20learning is not meant to by used without browser.

If you want data specifically from academic.microsoft.com use their api.

import requests

url = 'https://academic.microsoft.com/api/search'

data = {"query": "machine learning",
        "queryExpression": "",
        "filters": [],
        "orderBy": None,
        "skip": 0,
        "sortAscending": True,
        "take": 10}

r = requests.post(url=url, json=data)

result = r.json()

You will get data in nice format and easy to use.

Answered By: Adrian Krupa