How do I type annotate JSON data in Python?

Question:

I am adding type annotations to a lot of code to make it clear to other devs what my functions and methods do.
How would I type annotate a function that takes JSON data in as an argument, and returns JSON data?

(very simplified version)

def func(json_data):
    return json_data

what I want to do but with JSON instead of int:

def add_nums(a: int, b: int) -> int:
    return a+b
Asked By: Milind Sharma

||

Answers:

You can not do that. There are no "json objects" in python. Json is represented as a string. The most correct answer here would be:

def func(json_data: str) -> str:
    return json_data

In my opinion (I also think it is best practice but not sure about that) you should only convert your data to json when you really need it in that format. Before that you should always be working with dictionaries and lists.

Answered By: Kerrim

Json objects are usually like a bag of items. It can have numbers, string, floats, list and nested json objects. If you wish to deep dive into JSON body actual structure then following Option1 & 2 can assist you or you can do the 3 rd step.

First Option:
Please check this Python3 docs links.

If you can clearly define your json body then you can use following example.

from collections.abc import Sequence

ConnectionOptions = dict[str, str]
Address = tuple[str, int]
Server = tuple[Address, ConnectionOptions]

def broadcast_message(message: str, servers: Sequence[Server]) -> None:
    ...

# The static type checker will treat the previous type signature as
# being exactly equivalent to this one.
def broadcast_message(
        message: str,
        servers: Sequence[tuple[tuple[str, int], dict[str, str]]]) -> None:
    ...

Second Option: you can also define you own custom type classes to work with, unlike above where you create lots of global items.

https://docs.python.org/3/library/typing.html#newtype


from typing import NewType

UserId = NewType('UserId', int)
some_id = UserId(524313)

def get_user_name(user_id: UserId) -> str:
    ...

Third Option: Like the above suggested answers, using a str is simple approach. Treat you json body as string and using json modules to convert it to string & viceversa

Fourth Option: Using a Library to define your classes – https://marshmallow.readthedocs.io/en/stable/

If you are working on Apache Spark then https://github.com/ketgo/marshmallow-pyspark is worth knowing about.

Answered By: sam

Here’s a kind of brute force solution: Just manually define some JSON types that cover generic JSON objects (as they would be represented in Python). You can’t do recursive types in Python, but several nestings deep is usually enough for most use cases. After that we say best effort and allow Any.

You can create a module and define these types:

from collections.abc import (
    Mapping,
    Sequence,
)
from typing import (
    Any,
    Union,
)

PrimitiveJSON = Union[str, int, float, bool, None]

# Not every instance of Mapping or Sequence can be fed to json.dump() but those
# two generic types are the most specific *immutable* super-types of `list`,
# `tuple` and `dict`:

AnyJSON4 = Union[Mapping[str, Any], Sequence[Any], PrimitiveJSON]
AnyJSON3 = Union[Mapping[str, AnyJSON4], Sequence[AnyJSON4], PrimitiveJSON]
AnyJSON2 = Union[Mapping[str, AnyJSON3], Sequence[AnyJSON3], PrimitiveJSON]
AnyJSON1 = Union[Mapping[str, AnyJSON2], Sequence[AnyJSON2], PrimitiveJSON]
AnyJSON = Union[Mapping[str, AnyJSON1], Sequence[AnyJSON1], PrimitiveJSON]
JSON = Mapping[str, AnyJSON]
JSONs = Sequence[JSON]
CompositeJSON = Union[JSON, Sequence[AnyJSON]]

# For mutable JSON we can be more specific and use dict and list:

AnyMutableJSON4 = Union[dict[str, Any], list[Any], PrimitiveJSON]
AnyMutableJSON3 = Union[dict[str, AnyMutableJSON4], list[AnyMutableJSON4], PrimitiveJSON]
AnyMutableJSON2 = Union[dict[str, AnyMutableJSON3], list[AnyMutableJSON3], PrimitiveJSON]
AnyMutableJSON1 = Union[dict[str, AnyMutableJSON2], list[AnyMutableJSON2], PrimitiveJSON]
AnyMutableJSON = Union[dict[str, AnyMutableJSON1], list[AnyMutableJSON1], PrimitiveJSON]
MutableJSON = dict[str, AnyMutableJSON]
MutableJSONs = list[MutableJSON]
MutableCompositeJSON = Union[MutableJSON, list[AnyJSON]]

Then you can just import the types you need from your module. Mostly you’ll just use JSON, JSONs, MutableJSON, and MutableJSONs.

Note: this was taken from https://github.com/DataBiosphere/azul/blob/9fa0f78800dbbc7bf4822063ff31811b3bb3f55b/src/azul/types.py which uses the Apache 2.0 license.

It was likely inspired by this thread https://github.com/python/typing/issues/182.

This Stack Overflow answer contains some other useful suggestions depending on what you know about your data.

Answered By: leafmeal
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.