Unable to use the auto-generated value of sha256Hash to scrape property links from a webpage

Question:

After visiting this website, when I fill out the inputbox with Sydney CBD, NSW and hit the search button, I can see the required results displayed on that site.

I wish to scrape the property links using requests module. When I go for the following attempt, I can get the property links from the first page.

The problem here is that I hardcoded the value of sha256Hash within params, which is not what I want to do. I don’t know if the ID retrieved by issuing a get requests to the suggestion url needs to be converted to sha256Hash.

However, when I do that using this function get_hashed_string(), the value it produces is different from the hardcoded one that is available within params. As a result, the script spits out a keyError on this line: container = res.json().

import requests
import hashlib
from pprint import pprint
from bs4 import BeautifulSoup

url = 'https://suggest.realestate.com.au/consumer-suggest/suggestions'
link = 'https://lexa.realestate.com.au/graphql'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
payload = {
    'max': '7',
    'type': 'suburb,region,precinct,state,postcode',
    'src': 'homepage-web',
    'query': 'Sydney CBD, NSW'
}
params = {"operationName":"searchByQuery","variables":{"query":"{"channel":"buy","page":1,"pageSize":25,"filters":{"surroundingSuburbs":true,"excludeNoSalePrice":false,"ex-under-contract":false,"ex-deposit-taken":false,"excludeAuctions":false,"excludePrivateSales":false,"furnished":false,"petsAllowed":false,"hasScheduledAuction":false},"localities":[{"searchLocation":"sydney cbd, nsw"}]}","testListings":False,"nullifyOptionals":False},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"}}}

def get_hashed_string(keyword):
    hashed_str = hashlib.sha256(keyword.encode('utf-8')).hexdigest()
    return hashed_str

with requests.Session() as s:
    s.headers.update(headers)
    r = s.get(url,params=payload)
    hashed_id = r.json()['_embedded']['suggestions'][0]['id']

    # params['extensions']['persistedQuery']['sha256Hash'] = get_hashed_string(hashed_id)
    
    res = s.post(link,json=params)
    container = res.json()['data']['buySearch']['results']['exact']['items']
    for item in container:
        print(item['listing']['_links']['canonical']['href'])

If I run the script as is, it works beautifully. When I uncomment the line params['extensions']['persistedQuery']--> and run the script again, the script breaks.

How can I generate the value of sha256Hash and use the same within the script above?

Asked By: robots.txt

||

Answers:

This is not how graphql works. The sha value stays the same across all requests but what you’re missing is a valid graphql query.

You have to reconstruct that first and then just use the API pagination – that’s the key.

Here’s how:

import json

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0",
    "Accept": "application/graphql+json, application/json",
    "Content-Type": "application/json",
    "Host": "lexa.realestate.com.au",
    "Referer": "https://www.realestate.com.au/",
}

endpoint = "https://lexa.realestate.com.au/graphql"
graph_query = "{"channel":"buy","page":page_number,"pageSize":25,"filters":{"surroundingSuburbs":true," 
               ""excludeNoSalePrice":false,"ex-under-contract":false,"ex-deposit-taken":false," 
               ""excludeAuctions":false,"excludePrivateSales":false,"furnished":false,"petsAllowed":false," 
               ""hasScheduledAuction":false},"localities":[{"searchLocation":"sydney cbd, nsw"}]}"

graph_json = {
  "operationName": "searchByQuery",
  "variables": {
    "query": "",
    "testListings": False,
    "nullifyOptionals": False
  },
  "extensions": {
    "persistedQuery": {
      "version": 1,
      "sha256Hash": "ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"
    }
  }
}

if __name__ == '__main__':
    with requests.Session() as s:
        for page in range(1, 3):
            graph_json['variables']['query'] = graph_query.replace('page_number', str(page))
            r = s.post(endpoint, headers=headers, data=json.dumps(graph_json))
            listing = r.json()['data']['buySearch']['results']['exact']['items']
            for item in listing:
                print(item['listing']['_links']['canonical']['href'])

This should give you:

https://www.realestate.com.au/property-apartment-nsw-sydney-140558991
https://www.realestate.com.au/property-apartment-nsw-sydney-141380404
https://www.realestate.com.au/property-apartment-nsw-sydney-140310979
https://www.realestate.com.au/property-apartment-nsw-sydney-141259592
https://www.realestate.com.au/property-apartment-nsw-barangaroo-140555291
https://www.realestate.com.au/property-apartment-nsw-sydney-140554403
https://www.realestate.com.au/property-apartment-nsw-millers+point-141245584
https://www.realestate.com.au/property-apartment-nsw-haymarket-139205259
https://www.realestate.com.au/project/hyde-metropolitan-by-deicorp-sydney-600036803
https://www.realestate.com.au/property-apartment-nsw-haymarket-140807411
https://www.realestate.com.au/property-apartment-nsw-sydney-141370756
https://www.realestate.com.au/property-apartment-nsw-sydney-141370364
https://www.realestate.com.au/property-apartment-nsw-haymarket-140425111
https://www.realestate.com.au/project/greenland-centre-sydney-600028910
https://www.realestate.com.au/property-apartment-nsw-sydney-141364136
https://www.realestate.com.au/property-apartment-nsw-sydney-139367203
https://www.realestate.com.au/property-apartment-nsw-sydney-141156696
https://www.realestate.com.au/property-apartment-nsw-sydney-141362880
https://www.realestate.com.au/property-studio-nsw-sydney-141311384
https://www.realestate.com.au/property-apartment-nsw-haymarket-141354876
https://www.realestate.com.au/property-apartment-nsw-the+rocks-140413283
https://www.realestate.com.au/property-apartment-nsw-sydney-141350552
https://www.realestate.com.au/property-apartment-nsw-sydney-140657935
https://www.realestate.com.au/property-apartment-nsw-barangaroo-139149039
https://www.realestate.com.au/property-apartment-nsw-haymarket-141034784
https://www.realestate.com.au/property-apartment-nsw-sydney-141230640
https://www.realestate.com.au/property-apartment-nsw-barangaroo-141340768
https://www.realestate.com.au/property-apartment-nsw-haymarket-141337684
https://www.realestate.com.au/property-unitblock-nsw-millers+point-141337528
https://www.realestate.com.au/property-apartment-nsw-sydney-141028828
https://www.realestate.com.au/property-apartment-nsw-sydney-141223160
https://www.realestate.com.au/property-apartment-nsw-sydney-140643067
https://www.realestate.com.au/property-apartment-nsw-sydney-140768179
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406051
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406047
https://www.realestate.com.au/property-apartment-nsw-sydney-139652067
https://www.realestate.com.au/property-apartment-nsw-sydney-140032667
https://www.realestate.com.au/property-apartment-nsw-sydney-127711002
https://www.realestate.com.au/property-apartment-nsw-sydney-140903924
https://www.realestate.com.au/property-apartment-nsw-walsh+bay-139130519
https://www.realestate.com.au/property-apartment-nsw-sydney-140285823
https://www.realestate.com.au/property-apartment-nsw-sydney-140761223
https://www.realestate.com.au/project/111-castlereagh-sydney-600031082
https://www.realestate.com.au/property-apartment-nsw-sydney-140633099
https://www.realestate.com.au/property-apartment-nsw-haymarket-141102892
https://www.realestate.com.au/property-apartment-nsw-sydney-139522379
https://www.realestate.com.au/property-apartment-nsw-sydney-139521259
https://www.realestate.com.au/property-apartment-nsw-sydney-139521219
https://www.realestate.com.au/property-apartment-nsw-haymarket-140007279
https://www.realestate.com.au/property-apartment-nsw-haymarket-139156515
Answered By: baduker