JSON file request from site returns error 403

Question:

I’m trying to collect some data from a game box score like this: https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/

The data is stored in a file (‘data.json’) which I managed to download from network page on chrome . I’ve been able to then parse it and get the data I need.
Now I’m trying to pull the directly from the url (without downloading the file) to automate my data gathering from multiple pages of the same kind.
I’m no expert in requests from sites, especially if they are not static and the information is actively taken with a / so forgive any bad phrasing of the concepts.

This is what I’ve tried so far:

url = "https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/"

response = urlopen(url)
data = json.loads(response.read())

#json parsing and data gathering from data

which gives the error:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I then tried adding the ‘data.json’ at the end of the url:

url = "https://fibalivestats.dcd.shared.geniussports.com/u/LEGBF/2213178/data.json"

response = urlopen(url)
data = json.loads(response.read())

#json parsing and data gathering from data

which produces:

urllib.error.HTTPError: HTTP Error 403: Forbidden

From what I understand in the first case the request just comes up empty, while on the second case it is not able to open the file.
I understood that if I don’t have manually opened the chrome page the https://…/data.json page returns the error 403, however it correctly loads the data.json after I reload the page with ctr+R on the network page.
What I understand is that I need to perform some other action beyond the requests.get() or anything similar from urllib , in order to pull down the json file.
Could someone point me in the right direction?

Asked By: Michele Scattola

||

Answers:

Using the correct URL in your Python script correctly loads the JSON. The confusion is that you get a 403 code rather than a 404.

The 403 code is due to the permissions on the s3 bucket, as described in this blog post and in more detail in the AWS docs

If you don’t have the s3:ListBucket permission, Amazon S3 will return an HTTP status code 403 (“access denied”) error.

If you look at the headers for the failed request, it reports that it is served by S3.

If you look at the chrome developer tools when loading the HTML page, the URL for the data actually is:
https://fibalivestats.dcd.shared.geniussports.com/data/2213178/data.json

Answered By: Pete Kirkham

You can use . For ex. I scraped names of player You can develop and add to code what do yo want.

from selenium import webdriver
from selenium.webdriver.common.by import By


driver = webdriver.Chrome(r'C:UsersKriegDownloadschromedriver_win32chromedriver.exe')
driver.get(url)

x = driver.find_elements(By.CSS_SELECTOR, 'td.player-name.team-0-summary-leaders')
obj = {}
for player in x:
    print(z.text)
Answered By: Elkhan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.