Making API call with Requests in Python returns only one item instead of many

Question:

A problem with likely a very easy fix, yet I’m unfortunately new to this.
The problem is: My generated csv file includes data from only one URL, while I want them all.

I’ve made a list of many contract numbers and I’m trying to access them all and return their data into one csv file (a long list). The API’s URL consists of a baseURL and a contract number plus some parameters, so my URLs look like this (showing 2 of 150)
https://api.nfz.gov.pl/app-umw-api/agreements/edc47a7d-a3b8-d354-79d5-a0518f8ba6d4?format=json&api-version=1.2&limit=25&page={}
https://api.nfz.gov.pl/app-umw-api/agreements/9a6d9313-c9cc-c0db-9c86-b7b4be0e11c1?format=json&api-version=1.2&limit=25&page={}

The publisher has imposed a limit of 25 records per page, therefore I’ve got some pagination going on here.

It seems like the program is making calls into each URL in turn, given that it printed the number of pages from each call. But the csv only has 4 rows, instead of hundreds. I’m wondering where I’m going wrong. I tried to fix by deleting the indent on the last 3 lines (no change) and other trial&error.

Another small question – the 4 rows are actually duplicated 2 rows. I think my code somewhere duplicates the first page of results, but I can’t figure out where.

And another one – how can I make the first column of the csv file show the ‘contract’ (from my list ‘contracts’) that relates to the output? I need some way of identifying which rows in the csv came from which contract while the API keeps the info in a separate branch of the data ‘tree’ that I don’t really know how to return efficiently.

import requests
import pandas as pd
import math
from contracts_list1 import contracts

baseurl = 'https://api.nfz.gov.pl/app-umw-api/agreements/'
for contract in contracts:
    api_url = ''.join([baseurl, contract])

    def main_request(api_url):
        r = requests.get(api_url)
        return r.json()

    def get_pages(response):
        return math.ceil(response['meta']['count'] / 25)

    p_number = main_request(api_url)
    all_data = []
    for page in range(0, get_pages(p_number)+1): # <-- increase page numbers here
        data = requests.get(api_url.format(page)).json()

        for a in data["data"]["plans"]:
            all_data.append({**a["attributes"]})

    df = pd.DataFrame(all_data)

    df.to_csv('file1.csv', encoding='utf-8-sig', index=False)
    print(get_pages(p_number))
Asked By: Michael Wiz

||

Answers:

Your accumulator all_date is inside of the contracts loop, therefore each iteration will overwrite the last iteration result. That’s why you’re only seeing the result of the last iteration, instead of all of them.

Try to put your accumulator all_data = [] outside of your outer For Loop:

import requests
import pandas as pd
import math
from contracts_list1 import contracts

baseurl = 'https://api.nfz.gov.pl/app-umw-api/agreements/'
all_data = []
for contract in contracts:
    api_url = ''.join([baseurl, contract])

    def main_request(api_url):
        r = requests.get(api_url)
        return r.json()

    def get_pages(response):
        return math.ceil(response['meta']['count'] / 25)

    p_number = main_request(api_url)
    for page in range(0, get_pages(p_number)+1): # <-- increase page numbers here
        data = requests.get(api_url.format(page)).json()

        for a in data["data"]["plans"]:
            all_data.append({**a["attributes"]})

df = pd.DataFrame(all_data)
df.to_csv('file1.csv', encoding='utf-8-sig', index=False)
print(get_pages(p_number))
Answered By: Freda Xin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.