Unable to use the auto-generated value of sha256Hash to scrape property links from a webpage
Question:
After visiting this website, when I fill out the inputbox with Sydney CBD, NSW
and hit the search button, I can see the required results displayed on that site.
I wish to scrape the property links using requests module. When I go for the following attempt, I can get the property links from the first page.
The problem here is that I hardcoded the value of sha256Hash
within params, which is not what I want to do. I don’t know if the ID retrieved by issuing a get requests to the suggestion url needs to be converted to sha256Hash
.
However, when I do that using this function get_hashed_string()
, the value it produces is different from the hardcoded one that is available within params. As a result, the script spits out a keyError
on this line: container = res.json()
.
import requests
import hashlib
from pprint import pprint
from bs4 import BeautifulSoup
url = 'https://suggest.realestate.com.au/consumer-suggest/suggestions'
link = 'https://lexa.realestate.com.au/graphql'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
payload = {
'max': '7',
'type': 'suburb,region,precinct,state,postcode',
'src': 'homepage-web',
'query': 'Sydney CBD, NSW'
}
params = {"operationName":"searchByQuery","variables":{"query":"{"channel":"buy","page":1,"pageSize":25,"filters":{"surroundingSuburbs":true,"excludeNoSalePrice":false,"ex-under-contract":false,"ex-deposit-taken":false,"excludeAuctions":false,"excludePrivateSales":false,"furnished":false,"petsAllowed":false,"hasScheduledAuction":false},"localities":[{"searchLocation":"sydney cbd, nsw"}]}","testListings":False,"nullifyOptionals":False},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"}}}
def get_hashed_string(keyword):
hashed_str = hashlib.sha256(keyword.encode('utf-8')).hexdigest()
return hashed_str
with requests.Session() as s:
s.headers.update(headers)
r = s.get(url,params=payload)
hashed_id = r.json()['_embedded']['suggestions'][0]['id']
# params['extensions']['persistedQuery']['sha256Hash'] = get_hashed_string(hashed_id)
res = s.post(link,json=params)
container = res.json()['data']['buySearch']['results']['exact']['items']
for item in container:
print(item['listing']['_links']['canonical']['href'])
If I run the script as is, it works beautifully. When I uncomment the line params['extensions']['persistedQuery']-->
and run the script again, the script breaks.
How can I generate the value of sha256Hash
and use the same within the script above?
Answers:
This is not how graphql
works. The sha
value stays the same across all requests but what you’re missing is a valid graphql
query.
You have to reconstruct that first and then just use the API pagination – that’s the key.
Here’s how:
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0",
"Accept": "application/graphql+json, application/json",
"Content-Type": "application/json",
"Host": "lexa.realestate.com.au",
"Referer": "https://www.realestate.com.au/",
}
endpoint = "https://lexa.realestate.com.au/graphql"
graph_query = "{"channel":"buy","page":page_number,"pageSize":25,"filters":{"surroundingSuburbs":true,"
""excludeNoSalePrice":false,"ex-under-contract":false,"ex-deposit-taken":false,"
""excludeAuctions":false,"excludePrivateSales":false,"furnished":false,"petsAllowed":false,"
""hasScheduledAuction":false},"localities":[{"searchLocation":"sydney cbd, nsw"}]}"
graph_json = {
"operationName": "searchByQuery",
"variables": {
"query": "",
"testListings": False,
"nullifyOptionals": False
},
"extensions": {
"persistedQuery": {
"version": 1,
"sha256Hash": "ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"
}
}
}
if __name__ == '__main__':
with requests.Session() as s:
for page in range(1, 3):
graph_json['variables']['query'] = graph_query.replace('page_number', str(page))
r = s.post(endpoint, headers=headers, data=json.dumps(graph_json))
listing = r.json()['data']['buySearch']['results']['exact']['items']
for item in listing:
print(item['listing']['_links']['canonical']['href'])
This should give you:
https://www.realestate.com.au/property-apartment-nsw-sydney-140558991
https://www.realestate.com.au/property-apartment-nsw-sydney-141380404
https://www.realestate.com.au/property-apartment-nsw-sydney-140310979
https://www.realestate.com.au/property-apartment-nsw-sydney-141259592
https://www.realestate.com.au/property-apartment-nsw-barangaroo-140555291
https://www.realestate.com.au/property-apartment-nsw-sydney-140554403
https://www.realestate.com.au/property-apartment-nsw-millers+point-141245584
https://www.realestate.com.au/property-apartment-nsw-haymarket-139205259
https://www.realestate.com.au/project/hyde-metropolitan-by-deicorp-sydney-600036803
https://www.realestate.com.au/property-apartment-nsw-haymarket-140807411
https://www.realestate.com.au/property-apartment-nsw-sydney-141370756
https://www.realestate.com.au/property-apartment-nsw-sydney-141370364
https://www.realestate.com.au/property-apartment-nsw-haymarket-140425111
https://www.realestate.com.au/project/greenland-centre-sydney-600028910
https://www.realestate.com.au/property-apartment-nsw-sydney-141364136
https://www.realestate.com.au/property-apartment-nsw-sydney-139367203
https://www.realestate.com.au/property-apartment-nsw-sydney-141156696
https://www.realestate.com.au/property-apartment-nsw-sydney-141362880
https://www.realestate.com.au/property-studio-nsw-sydney-141311384
https://www.realestate.com.au/property-apartment-nsw-haymarket-141354876
https://www.realestate.com.au/property-apartment-nsw-the+rocks-140413283
https://www.realestate.com.au/property-apartment-nsw-sydney-141350552
https://www.realestate.com.au/property-apartment-nsw-sydney-140657935
https://www.realestate.com.au/property-apartment-nsw-barangaroo-139149039
https://www.realestate.com.au/property-apartment-nsw-haymarket-141034784
https://www.realestate.com.au/property-apartment-nsw-sydney-141230640
https://www.realestate.com.au/property-apartment-nsw-barangaroo-141340768
https://www.realestate.com.au/property-apartment-nsw-haymarket-141337684
https://www.realestate.com.au/property-unitblock-nsw-millers+point-141337528
https://www.realestate.com.au/property-apartment-nsw-sydney-141028828
https://www.realestate.com.au/property-apartment-nsw-sydney-141223160
https://www.realestate.com.au/property-apartment-nsw-sydney-140643067
https://www.realestate.com.au/property-apartment-nsw-sydney-140768179
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406051
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406047
https://www.realestate.com.au/property-apartment-nsw-sydney-139652067
https://www.realestate.com.au/property-apartment-nsw-sydney-140032667
https://www.realestate.com.au/property-apartment-nsw-sydney-127711002
https://www.realestate.com.au/property-apartment-nsw-sydney-140903924
https://www.realestate.com.au/property-apartment-nsw-walsh+bay-139130519
https://www.realestate.com.au/property-apartment-nsw-sydney-140285823
https://www.realestate.com.au/property-apartment-nsw-sydney-140761223
https://www.realestate.com.au/project/111-castlereagh-sydney-600031082
https://www.realestate.com.au/property-apartment-nsw-sydney-140633099
https://www.realestate.com.au/property-apartment-nsw-haymarket-141102892
https://www.realestate.com.au/property-apartment-nsw-sydney-139522379
https://www.realestate.com.au/property-apartment-nsw-sydney-139521259
https://www.realestate.com.au/property-apartment-nsw-sydney-139521219
https://www.realestate.com.au/property-apartment-nsw-haymarket-140007279
https://www.realestate.com.au/property-apartment-nsw-haymarket-139156515
After visiting this website, when I fill out the inputbox with Sydney CBD, NSW
and hit the search button, I can see the required results displayed on that site.
I wish to scrape the property links using requests module. When I go for the following attempt, I can get the property links from the first page.
The problem here is that I hardcoded the value of sha256Hash
within params, which is not what I want to do. I don’t know if the ID retrieved by issuing a get requests to the suggestion url needs to be converted to sha256Hash
.
However, when I do that using this function get_hashed_string()
, the value it produces is different from the hardcoded one that is available within params. As a result, the script spits out a keyError
on this line: container = res.json()
.
import requests
import hashlib
from pprint import pprint
from bs4 import BeautifulSoup
url = 'https://suggest.realestate.com.au/consumer-suggest/suggestions'
link = 'https://lexa.realestate.com.au/graphql'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
payload = {
'max': '7',
'type': 'suburb,region,precinct,state,postcode',
'src': 'homepage-web',
'query': 'Sydney CBD, NSW'
}
params = {"operationName":"searchByQuery","variables":{"query":"{"channel":"buy","page":1,"pageSize":25,"filters":{"surroundingSuburbs":true,"excludeNoSalePrice":false,"ex-under-contract":false,"ex-deposit-taken":false,"excludeAuctions":false,"excludePrivateSales":false,"furnished":false,"petsAllowed":false,"hasScheduledAuction":false},"localities":[{"searchLocation":"sydney cbd, nsw"}]}","testListings":False,"nullifyOptionals":False},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"}}}
def get_hashed_string(keyword):
hashed_str = hashlib.sha256(keyword.encode('utf-8')).hexdigest()
return hashed_str
with requests.Session() as s:
s.headers.update(headers)
r = s.get(url,params=payload)
hashed_id = r.json()['_embedded']['suggestions'][0]['id']
# params['extensions']['persistedQuery']['sha256Hash'] = get_hashed_string(hashed_id)
res = s.post(link,json=params)
container = res.json()['data']['buySearch']['results']['exact']['items']
for item in container:
print(item['listing']['_links']['canonical']['href'])
If I run the script as is, it works beautifully. When I uncomment the line params['extensions']['persistedQuery']-->
and run the script again, the script breaks.
How can I generate the value of sha256Hash
and use the same within the script above?
This is not how graphql
works. The sha
value stays the same across all requests but what you’re missing is a valid graphql
query.
You have to reconstruct that first and then just use the API pagination – that’s the key.
Here’s how:
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0",
"Accept": "application/graphql+json, application/json",
"Content-Type": "application/json",
"Host": "lexa.realestate.com.au",
"Referer": "https://www.realestate.com.au/",
}
endpoint = "https://lexa.realestate.com.au/graphql"
graph_query = "{"channel":"buy","page":page_number,"pageSize":25,"filters":{"surroundingSuburbs":true,"
""excludeNoSalePrice":false,"ex-under-contract":false,"ex-deposit-taken":false,"
""excludeAuctions":false,"excludePrivateSales":false,"furnished":false,"petsAllowed":false,"
""hasScheduledAuction":false},"localities":[{"searchLocation":"sydney cbd, nsw"}]}"
graph_json = {
"operationName": "searchByQuery",
"variables": {
"query": "",
"testListings": False,
"nullifyOptionals": False
},
"extensions": {
"persistedQuery": {
"version": 1,
"sha256Hash": "ef58e42a4bd826a761f2092d573ee0fb1dac5a70cd0ce71abfffbf349b5b89c1"
}
}
}
if __name__ == '__main__':
with requests.Session() as s:
for page in range(1, 3):
graph_json['variables']['query'] = graph_query.replace('page_number', str(page))
r = s.post(endpoint, headers=headers, data=json.dumps(graph_json))
listing = r.json()['data']['buySearch']['results']['exact']['items']
for item in listing:
print(item['listing']['_links']['canonical']['href'])
This should give you:
https://www.realestate.com.au/property-apartment-nsw-sydney-140558991
https://www.realestate.com.au/property-apartment-nsw-sydney-141380404
https://www.realestate.com.au/property-apartment-nsw-sydney-140310979
https://www.realestate.com.au/property-apartment-nsw-sydney-141259592
https://www.realestate.com.au/property-apartment-nsw-barangaroo-140555291
https://www.realestate.com.au/property-apartment-nsw-sydney-140554403
https://www.realestate.com.au/property-apartment-nsw-millers+point-141245584
https://www.realestate.com.au/property-apartment-nsw-haymarket-139205259
https://www.realestate.com.au/project/hyde-metropolitan-by-deicorp-sydney-600036803
https://www.realestate.com.au/property-apartment-nsw-haymarket-140807411
https://www.realestate.com.au/property-apartment-nsw-sydney-141370756
https://www.realestate.com.au/property-apartment-nsw-sydney-141370364
https://www.realestate.com.au/property-apartment-nsw-haymarket-140425111
https://www.realestate.com.au/project/greenland-centre-sydney-600028910
https://www.realestate.com.au/property-apartment-nsw-sydney-141364136
https://www.realestate.com.au/property-apartment-nsw-sydney-139367203
https://www.realestate.com.au/property-apartment-nsw-sydney-141156696
https://www.realestate.com.au/property-apartment-nsw-sydney-141362880
https://www.realestate.com.au/property-studio-nsw-sydney-141311384
https://www.realestate.com.au/property-apartment-nsw-haymarket-141354876
https://www.realestate.com.au/property-apartment-nsw-the+rocks-140413283
https://www.realestate.com.au/property-apartment-nsw-sydney-141350552
https://www.realestate.com.au/property-apartment-nsw-sydney-140657935
https://www.realestate.com.au/property-apartment-nsw-barangaroo-139149039
https://www.realestate.com.au/property-apartment-nsw-haymarket-141034784
https://www.realestate.com.au/property-apartment-nsw-sydney-141230640
https://www.realestate.com.au/property-apartment-nsw-barangaroo-141340768
https://www.realestate.com.au/property-apartment-nsw-haymarket-141337684
https://www.realestate.com.au/property-unitblock-nsw-millers+point-141337528
https://www.realestate.com.au/property-apartment-nsw-sydney-141028828
https://www.realestate.com.au/property-apartment-nsw-sydney-141223160
https://www.realestate.com.au/property-apartment-nsw-sydney-140643067
https://www.realestate.com.au/property-apartment-nsw-sydney-140768179
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406051
https://www.realestate.com.au/property-apartment-nsw-haymarket-139406047
https://www.realestate.com.au/property-apartment-nsw-sydney-139652067
https://www.realestate.com.au/property-apartment-nsw-sydney-140032667
https://www.realestate.com.au/property-apartment-nsw-sydney-127711002
https://www.realestate.com.au/property-apartment-nsw-sydney-140903924
https://www.realestate.com.au/property-apartment-nsw-walsh+bay-139130519
https://www.realestate.com.au/property-apartment-nsw-sydney-140285823
https://www.realestate.com.au/property-apartment-nsw-sydney-140761223
https://www.realestate.com.au/project/111-castlereagh-sydney-600031082
https://www.realestate.com.au/property-apartment-nsw-sydney-140633099
https://www.realestate.com.au/property-apartment-nsw-haymarket-141102892
https://www.realestate.com.au/property-apartment-nsw-sydney-139522379
https://www.realestate.com.au/property-apartment-nsw-sydney-139521259
https://www.realestate.com.au/property-apartment-nsw-sydney-139521219
https://www.realestate.com.au/property-apartment-nsw-haymarket-140007279
https://www.realestate.com.au/property-apartment-nsw-haymarket-139156515