JSON file request from site returns error 403

Question

I’m trying to collect some data from a game box score like this: https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/

The data is stored in a json file (‘data.json’) which I managed to download from network page on chrome devtools. I’ve been able to then parse it and get the data I need.
Now I’m trying to pull the json directly from the url (without downloading the file) to automate my data gathering from multiple pages of the same kind.
I’m no expert in requests from sites, especially if they are not static and the information is actively taken with a json/javascript so forgive any bad phrasing of the concepts.

This is what I’ve tried so far:

url = "https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/"

response = urlopen(url)
data = json.loads(response.read())

#json parsing and data gathering from data

which gives the error:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I then tried adding the ‘data.json’ at the end of the url:

url = "https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/data.json"

response = urlopen(url)
data = json.loads(response.read())

#json parsing and data gathering from data

which produces:

urllib.error.HTTPError: HTTP Error 403: Forbidden

From what I understand in the first case the request just comes up empty, while on the second case it is not able to open the json file.
I understood that if I don’t have manually opened the chrome devtools page the https://…/data.json page returns the error 403, however it correctly loads the data.json after I reload the page with ctr+R on the network page.
What I understand is that I need to perform some other action beyond the requests.get() or anything similar from urllib , in order to pull down the json file.
Could someone point me in the right direction?

Asked By: Michele Scattola

||

Source

Answer 1

Using the correct URL in your Python script correctly loads the JSON. The confusion is that you get a 403 code rather than a 404.

The 403 code is due to the permissions on the s3 bucket, as described in this blog post and in more detail in the AWS docs

If you don’t have the s3:ListBucket permission, Amazon S3 will return an HTTP status code 403 (“access denied”) error.

If you look at the headers for the failed request, it reports that it is served by S3.

If you look at the chrome developer tools when loading the HTML page, the URL for the data actually is:
https://fibalivestats.dcd.shared.geniussports.com/data/2213178/data.json

Answered By: Pete Kirkham

Answer 2

You can use selenium. For ex. I scraped names of player You can develop and add to code what do yo want.

from selenium import webdriver
from selenium.webdriver.common.by import By


driver = webdriver.Chrome(r'C:UsersKriegDownloadschromedriver_win32chromedriver.exe')
driver.get(url)

x = driver.find_elements(By.CSS_SELECTOR, 'td.player-name.team-0-summary-leaders')
obj = {}
for player in x:
    print(z.text)

Answered By: Elkhan

JSON file request from site returns error 403

Question:

Answers: