Python how to parse output of Amazon RDS Data Service

Question:

I’m struggling to parse output of ExecuteStatement action using Amazon RDS Data service https://docs.aws.amazon.com/rdsdataservice/latest/APIReference/API_ExecuteStatement.html

Documentation is very poor and I can’t find any meaningful examples.

I’m using Aurora MySql DB

Output is returned like this. This example is 2 rows of data returned by SELECT statement, I’ve replaced actual data with word “data”

 {'ResponseMetadata': {'RequestId': '955a6aee-5bad-4f87-a455-b83a10a8a31b', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '955a6aee-5bad-4f87-a455-b83a10a8a31b', 'content-type': 'application/json', 'content-length': '809', 'date': 'Tue, 02 Jun 2020 05:39:22 GMT'}, 'RetryAttempts': 0}, 'numberOfRecordsUpdated': 0, 'records': [[{'stringValue': 'data'}, {'longValue': data}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': ''}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}], [{'stringValue': 'data'}, {'longValue': data}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': ''}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}, {'stringValue': 'data'}]]}

I need to extract actually only this data fields, so I can use them for INSERT statement in another table.

As per documentation original response is dictionary data type, so I get 3 keys here ‘ResponseMetadata’, ‘numberOfRecordsUpdated’ and ‘records’.
As per documentation ‘records’ is Type: Array of arrays of Field objects and here is where I stuck. How I can extract Field objects from array of arrays inside dictionary?

Asked By: Petr Kashlikov

||

Answers:

There are many ways, but the most basic nested for-loop in python could suffice:

data_values = []

for record in result['records']:
    print()
    row_data = []
    for data_dict in record:
        #print(data_dict)
        for data_type, data_value in data_dict.items():
            print(data_type, data_value)
            row_data.append(data_value)

    data_values.append(row_data)

print(data_values)  

The code can for a good basis for modifications to suit your needs. It produces the following:

stringValue data
longValue data
stringValue data
stringValue data
stringValue 
stringValue data
stringValue data
stringValue data
stringValue data
stringValue data
stringValue data
stringValue data
stringValue data

stringValue data
longValue data
stringValue data
stringValue data
stringValue 
stringValue data
stringValue data
stringValue data
stringValue data
stringValue data
stringValue data
stringValue data
stringValue data

And data_values:

[['data', 'data', 'data', 'data', '', 'data', 'data', 'data', 'data', 'data', 'data', 'data', 'data'], ['data', 'data', 'data', 'data', '', 'data', 'data', 'data', 'data', 'data', 'data', 'data', 'data']]
Answered By: Marcin

You are getting this

response = 
{'ResponseMetadata': {'HTTPHeaders': {'content-length': '809',
                                      'content-type': 'application/json',
                                      'date': 'Tue, 02 Jun 2020 05:39:22 GMT',
                                      'x-amzn-requestid': '955a6aee-5bad-4f87-a455-b83a10a8a31b'},
                      'HTTPStatusCode': 200,
                      'RequestId': '955a6aee-5bad-4f87-a455-b83a10a8a31b',
                      'RetryAttempts': 0},
 'numberOfRecordsUpdated': 0,
 'records': [[{'stringValue': 'data'},
              {'longValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': ''},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'}],
             [{'stringValue': 'data'},
              {'longValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': ''},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'},
              {'stringValue': 'data'}]]}

You can read yuor rows by iterating response['records']

data_fields = []
for row in response['records']:
    for field in row:
        if field['stringValue']:
           print(field)
           data_fields.append(field)
        elif field['longValue']:
           print(field)
           data_fields.append(field)

Consider to use dictionary cursor by directly accessing database by mysql connection configuration like so you will not see aws request metadata.

import mysql.connector

cnx = mysql.connector.connect(user='username', password='password',
                              host='aws rds mysql host',
                              database='database_name')
cnx.close()

You can see mysql connection details from AWS console.

https://dev.mysql.com/doc/connector-python/en/connector-python-example-connecting.html

Answered By: Alok

Alternatively if you want to handle your data in a ‘dict-like’ fashion, if you’re using pydantic model parsing for example, you can do something like this

NOTE: you need to add

includeResultMetadata=True

in the execute_statement call for columnMetadata to be present

columns_metadata = result.get('columnMetadata')

object_list = []

for record in result.get('records'):
    row_dict = {}
    for i in range(len(columns_metadata)):
        column = columns_metadata[i].get("name")
        data_dict = record[i]
        for data_type, data_value in data_dict.items():
            if data_type == "isNull" and data_value is True:
                row_dict[column] = None
            else:
                row_dict[column] = data_value

    object_list.append(row_dict)

This will give you a list of all records following this format:

object_list = [
    {
        "col1": "obj1val1",
        "col2": "obj1val2",
        "col3": "obj1val3",
    },
    {
        "col1": "obj2val1",
        "col2": "obj2val2",
        "col3": "obj2val3",
    }
]

NB: This could probably be optimized for performance (currently written for readability)

Answered By: Shraneid