Get json data from multiple api pages into one main json output

Question:

I’m trying to get the json data from every page on an API and put that into one big json output.

(Docs for API i’m using: https://docs.scoresaber.com/#/Leaderboards/get_api_leaderboards)
When doing the following API call:
https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true
i get the object metadata which has total and itemsPerPage
Example:

"metadata": {
    "total": 193,
    "page": 1,
    "itemsPerPage": 14
  }

So 193/14 means i get 14 pages.

This means i can iterate through all pages by doing a request for each page with this API call: https://scoresaber.com/api/leaderboards?qualified=true&page=2
until i get to &page=4

Each page will result this json (trimmed example):

{
  "leaderboards": [
    {
      "id": 466447,
      "songHash": "E527C82AF2DEC46A23F12D742035D76CCA875904",
      "songName": "Parasite",
      "songSubName": "(feat. Hatsune Miku)",
      "songAuthorName": "DECO*27",
      "levelAuthorName": "Alice",
      "difficulty": {
        "leaderboardId": 466447,
        "difficulty": 1,
        "gameMode": "SoloStandard",
        "difficultyRaw": "_Easy_SoloStandard"
      },
      "maxScore": 0,
      "createdDate": "2022-06-01T17:16:52.000Z",
      "rankedDate": null,
      "qualifiedDate": "2022-06-14T05:53:21.000Z",
      "lovedDate": null,
      "ranked": false,
      "qualified": true,
      "loved": false,
      "maxPP": -1,
      "stars": 0,
      "plays": 70,
      "dailyPlays": 0,
      "positiveModifiers": false,
      "playerScore": null,
      "coverImage": "https://cdn.scoresaber.com/covers/E527C82AF2DEC46A23F12D742035D76CCA875904.png",
      "difficulties": null
    },
  ],
  "metadata": {
    "total": 193,
    "page": 2,
    "itemsPerPage": 14
  }
}

So what i want is to loop through all the pages and have every item in leaderboards into one json.

This is what I’ve tried:

import requests
import math
import json

response = requests.get("https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true")

api = json.loads(response.text)
pages = math.ceil(api['metadata']['total'] / api['metadata']['itemsPerPage'])
api = {}
for page in range(1, pages+1):
    api.update(json.loads(requests.get(f"https://scoresaber.com/api/leaderboards?qualified=true&page={page}").text))
api = json.dumps(api, indent=4)

But that seems to only get the last page and just overwrite the dictionary (i’m also not sure if i need to declare api as a dict.

So I’m just not sure what is going wrong, if im declaring stuff wrongly, if im requesting the api wrongly, or if im putting stuff wrongly into the dict, etc.

Asked By: miitchel

||

Answers:

If I understand you correctly you want to receive all data to one big list:

import json
import math
import requests

url1 = (
    "https://scoresaber.com/api/leaderboards?qualified=true&withMetadata=true"
)
url2 = "https://scoresaber.com/api/leaderboards?qualified=true&page={}"

api = requests.get(url1).json()
pages = math.ceil(api["metadata"]["total"] / api["metadata"]["itemsPerPage"])

all_data = []
for page in range(1, pages + 1):
    data = requests.get(url2.format(page)).json()
    all_data.extend(data["leaderboards"])

print(json.dumps(all_data, indent=4))

This will print all 193 items from all pages:

[
    {
        "id": 484864,
        "songHash": "80559A7A4AC0F62F27DAF1C59DF67F305250ADFF",
        "songName": "Phony",
        "songSubName": "feat. KAFU (Hoshimachi Suisei Cover)",
        "songAuthorName": "Tsumiki",
        "levelAuthorName": "Joshabi & Shad",


...
Answered By: Andrej Kesely
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.