How to convert a boto3 Dynamo DB item to a regular dictionary in Python?

Question:

In Python, when an item is retrieved from Dynamo DB using boto3, a schema like the following is obtained.

{
  "ACTIVE": {
    "BOOL": true
  },
  "CRC": {
    "N": "-1600155180"
  },
  "ID": {
    "S": "bewfv43843b"
  },
  "params": {
    "M": {
      "customer": {
        "S": "TEST"
      },
      "index": {
        "N": "1"
      }
    }
  },
  "THIS_STATUS": {
    "N": "10"
  },
  "TYPE": {
    "N": "22"
  }
}

Also when inserting or scanning, dictionaries have to be converted in this fashion. I haven’t been able to find a wrapper that takes care of such conversion. Since apparently boto3 does not support this, are there better alternatives than implementing code for it?

Asked By: manelmc

||

Answers:

In order to understand how to solve this, it’s important to recognize that boto3 has two basic modes of operation: one that uses the low-level Client API, and one that uses higher level abstractions like Table. The data structure shown in the question is an example of what is consumed/produced by the low-level API, which is also used by the AWS CLI and the dynamodb web services.

To answer your question – if you can work exclusively with the high-level abstractions like Table when using boto3 then things will be quite a bit easier for you, as the comments suggest. Then you can sidestep the whole problem – python types are marshaled to and from the low-level data format for you.

However, there are some times when it’s not possible to use those high-level constructs exclusively. I specifically ran into this problem when dealing with DynamoDB streams attached to Lambdas. The inputs to the lambda are always in the low-level format, and that format is harder to work with IMO.

After some digging I found that boto3 itself has some nifty features tucked away for doing conversions. These features are used implicitly in all of the internal conversions mentioned previously. To use them directly, import the TypeDeserializer/TypeSerializer classes and combine them with dict comprehensions like so:

import boto3

low_level_data = {
  "ACTIVE": {
    "BOOL": True
  },
  "CRC": {
    "N": "-1600155180"
  },
  "ID": {
    "S": "bewfv43843b"
  },
  "params": {
    "M": {
      "customer": {
        "S": "TEST"
      },
      "index": {
        "N": "1"
      }
    }
  },
  "THIS_STATUS": {
    "N": "10"
  },
  "TYPE": {
    "N": "22"
  }
}

# Lazy-eval the dynamodb attribute (boto3 is dynamic!)
boto3.resource('dynamodb')

# To go from low-level format to python
deserializer = boto3.dynamodb.types.TypeDeserializer()
python_data = {k: deserializer.deserialize(v) for k,v in low_level_data.items()}

# To go from python to low-level format
serializer = boto3.dynamodb.types.TypeSerializer()
low_level_copy = {k: serializer.serialize(v) for k,v in python_data.items()}

assert low_level_data == low_level_copy
Answered By: killthrush

There is a python package called "dynamodb-json" that can help you achieve this. The dynamodb-json util works the same as json loads and dumps functions. I prefer using this as it takes care of converting Decimal objects inherently.

You can find examples and how to install it by following this link – https://pypi.org/project/dynamodb-json/

Answered By: aamir23

You can use the TypeDeserializer class

from boto3.dynamodb.types import TypeDeserializer
deserializer = TypeDeserializer()

document = { "ACTIVE": { "BOOL": True }, "CRC": { "N": "-1600155180" }, "ID": { "S": "bewfv43843b" }, "params": { "M": { "customer": { "S": "TEST" }, "index": { "N": "1" } } }, "THIS_STATUS": { "N": "10" }, "TYPE": { "N": "22" } }
deserialized_document = {k: deserializer.deserialize(v) for k, v in document.items()}
print(deserialized_document)
Answered By: Fellipe

I went down writing a custom solution

It doesnt cover all types, but enough for the ones I use. Good starting ground for anyone to develop further,

from re import compile as re_compile


class Serializer:
    re_number = re_compile(r"^-?d+?.?d*$")

    def serialize(self, data: any) -> dict:
        if isinstance(data, bool):  # booleans are a subtype of integers so place above int
            return {'BOOL': data}
        if isinstance(data, (int, float)):
            return {'N': str(data)}
        if isinstance(data, type(None)) or not data:  # place below int (0) and bool (False)
            # returns NULL for empty list, tuple, dict, set or string
            return {'NULL': True}
        if isinstance(data, (list, tuple)):
            return {'L': [self.serialize(v) for v in data]}
        if isinstance(data, set):
            if all([isinstance(v, str) for v in data]):
                return {'SS': data}
            if all([self.re_number.match(str(v)) for v in data]):
                return {'NS': [str(v) for v in data]}
        if isinstance(data, dict):
            return {'M': {k: self.serialize(v) for k, v in data.items()}}
        return {'S': str(data)}  # safety net to catch all others

    def deserialize(self, data: dict) -> dict:
        _out = {}
        if not data:
            return _out
        for k, v in data.items():
            if k in ('S', 'SS', 'BOOL'):
                return v
            if k == 'N':
                return float(v) if '.' in v else int(v)
            if k == 'NS':
                return [float(_v) if '.' in _v else int(_v) for _v in v]
            if k == 'M':
                return {_k: self.deserialize(_v) for _k, _v in v.items()}
            if k == 'L':
                return [self.deserialize(_v) for _v in v]
            if k == 'NULL':
                return None
            _out[k] = self.deserialize(v)
        return _out

Usage

serialized = Serializer().serialize(input_dict)
print(serialized)

deserialized = Serializer().deserialize(serialized)
print(deserialized)

DynamoDB (python)

dynamodb = boto3.client('dynamodb')

dynamodb.put_item(
    TableName=table_name,
    Item={
        'id': {'S': id},
        'data': Serializer().serialize(data)
    }
)

response = dynamodb.get_item(
    TableName=table_name,
    Key={
        'id': {'S': id}
    }
)
data = Serializer().deserialize(response['Item'])
Answered By: Christian