Handle pagination in Python when interracting with Azure Graph API

Question:

I am getting all the resource groups tags in my tenant using an Azure Graph query which works perfectly using the Azure graph explorer from the portal.

Here is the query:

resourcecontainers
| where type == 'microsoft.resources/subscriptions/resourcegroups'
| extend dates=format_datetime(now(), "yyyy-MM-dd")
| join kind=leftouter (
    resourcecontainers
    | where type == 'microsoft.resources/subscriptions'
    | project SubscriptionName=name, subscriptionId)
    on subscriptionId
| project SubscriptionName, subscriptionId, resourceGroup, client_entity_name=tags.client_entity_name,
    owner_contact=tags.owner_contact, owner_group=tags.owner_group, financial_contact=tags.financial_contact, billing_code=tags.billing_code, security_contact=tags.security_contact,
    operational_contact=tags.operational_contact, profil=tags.profil, guid=tags.guid

I am getting all the results in the portal (more than 2000 resource groups).

When I tried to do the same using my Python script, I got a page limit of 530 resources.
Here is my script:

from azure.identity import DefaultAzureCredential
from azure.mgmt.resourcegraph import ResourceGraphClient
from azure.mgmt.resource import ResourceManagementClient
from azure.mgmt.resourcegraph.models import *
import json

# Initialize Azure credentials
credentials = DefaultAzureCredential()

# Initialize Resource Graph client
resource_graph_client = ResourceGraphClient(credentials)
skip = 0
result = []


query_code = f"""
resourcecontainers
| where type == 'microsoft.resources/subscriptions/resourcegroups'
| extend dates=format_datetime(now(), "yyyy-MM-dd")
| join kind=leftouter (
    resourcecontainers
    | where type == 'microsoft.resources/subscriptions'
    | project SubscriptionName=name, subscriptionId)
    on subscriptionId
| project SubscriptionName, subscriptionId, resourceGroup, client_entity_name=tags.client_entity_name,
    owner_contact=tags.owner_contact, owner_group=tags.owner_group, financial_contact=tags.financial_contact, billing_code=tags.billing_code, security_contact=tags.security_contact,
    operational_contact=tags.operational_contact, profil=tags.profil, guid=tags.guid, cloudbundle_type=tags.cloudbundle_type, environment=tags.environment,
    classification=tags.classification, app_name=tags.app_name, app_family=tags.app_family, application_id=tags.application_id,
    managed_by=tags.managed_by, managed_by_capmsp=tags.managed_by_capmsp, capmsp_service_level=tags.capmsp_service_level, sla_class=tags.sla_class,
    version=tags.version, dates, type, location, id_prefix=id
"""


query = QueryRequest(
            query= query_code 
)
query_response = resource_graph_client.resources(query)
query_response_str = str(query_response)
json_data = json.dumps(query_response_str)

json_data = json.loads(json_data)



output_file = "resource_groups_tags.txt"
with open(output_file, "w") as f:
    json.dump(json_data, f, indent=4)

Here is the first part of the response:

{'additional_properties': {}, 'total_records': 530, 'count': 530, 'result_truncated': 'false', 'skip_token': None, 'data': [{'SubscriptionName': '

I really don’t find how to handle pagination to get all the results as there is no skip/offset into the query. In Microsoft documentation they talk about the ‘skip_token’, but I did not find it really clear, in the response it is set to None.

Can someone help with this ?

I tried skip, limit… but the skip did not work with the limit so I don’t see how to handle it.

Asked By: Louey

||

Answers:

tl;dr: When a SkipToken arrives in your query results,
tack it on to your subsequent query to obtain next page of results.

The vendor documentation
explains:

Because Azure Resource Graph returns a maximum of 1,000 entries in a single query response, you might need to paginate your queries ….

handle pagination by passing the skip token being returned from the previous query response to the next paginated query.

You may also be interested in the --skip option
offered by the az CLI tool.

This portion of your example result
shows your two queries are not identical
and do not produce identical 2000-row result sets:
... , 'result_truncated': 'false', 'skip_token': None, ...
If they were identical, you would have seen 'true'
plus a valid token.

Answered By: J_H

I found the solution, I don’t know why the result limit was to 530, it changed to 1000 and I am getting the skip_token value in the response.

Here is the code I use:

from azure.identity import DefaultAzureCredential
from azure.mgmt.resourcegraph import ResourceGraphClient
from azure.mgmt.resource import ResourceManagementClient
from azure.mgmt.resourcegraph.models import *
import json

def get_tags(tenant: str):
    # Initialize Azure credentials
    credentials = DefaultAzureCredential()

    # Initialize Resource Graph client
    resource_graph_client = ResourceGraphClient(credentials)
    results = []


    query_code = f"""
    resourcecontainers
    | where type == 'microsoft.resources/subscriptions/resourcegroups'
    | extend dates=format_datetime(now(), "yyyy-MM-dd")
    | join kind=leftouter (
        resourcecontainers
        | where type == 'microsoft.resources/subscriptions'
        | project SubscriptionName=name, subscriptionId)
        on subscriptionId
    | project SubscriptionName, subscriptionId, resourceGroup, client_entity_name=tags.client_entity_name,
        owner_contact=tags.owner_contact, owner_group=tags.owner_group, financial_contact=tags.financial_contact, billing_code=tags.billing_code, security_contact=tags.security_contact,
        operational_contact=tags.operational_contact, profil=tags.profil, guid=tags.guid, cloudbundle_type=tags.cloudbundle_type, environment=tags.environment,
        classification=tags.classification, app_name=tags.app_name, app_family=tags.app_family, application_id=tags.application_id,
        managed_by=tags.managed_by, managed_by_capmsp=tags.managed_by_capmsp, capmsp_service_level=tags.capmsp_service_level, sla_class=tags.sla_class,
        version=tags.version, dates, type, location, id_prefix=id
    """
    

    skip_Token = None
    n = 0

    while True:

        query = QueryRequest(
                query = query_code,
                options = QueryRequestOptions(
                    skip_token= skip_Token
                )
            )
        query_response = resource_graph_client.resources(query)

        for tags in query_response.data:
            tags_params = {
                'environment': tags.get('environment'),
                'security_contact': tags.get('security_contact'),
                'owner_contact': tags.get('owner_contact'),
                'subscription': tags.get('SubscriptionName'),
                'subscription_id': tags.get('subscriptionId'),
                'resource_group': tags.get('resourceGroup'),
                'tenant': tenant
            }
            
            results.append(tags_params)
        n +=1
        skip_Token = query_response.skip_token

        if not skip_Token:
            break
    print(n)

    return results
Answered By: Louey
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.