Make the Python json encoder support Python's new dataclasses
Question:
Starting with Python 3.7, there is something called a dataclass:
from dataclasses import dataclass
@dataclass
class Foo:
x: str
However, the following fails:
>>> import json
>>> foo = Foo(x="bar")
>>> json.dumps(foo)
TypeError: Object of type Foo is not JSON serializable
How can I make json.dumps()
encode instances of Foo
into json objects?
Answers:
Much like you can add support to the JSON encoder for datetime
objects or Decimals, you can also provide a custom encoder subclass to serialize dataclasses:
import dataclasses, json
class EnhancedJSONEncoder(json.JSONEncoder):
def default(self, o):
if dataclasses.is_dataclass(o):
return dataclasses.asdict(o)
return super().default(o)
json.dumps(foo, cls=EnhancedJSONEncoder)
Can’t you just use the dataclasses.asdict()
function to convert the dataclass
to a dict? Something like:
>>> @dataclass
... class Foo:
... a: int
... b: int
...
>>> x = Foo(1,2)
>>> json.dumps(dataclasses.asdict(x))
'{"a": 1, "b": 2}'
If you are ok with using a library for that, you can use dataclasses-json. Here is an example:
from dataclasses import dataclass
from dataclasses_json import dataclass_json
@dataclass_json
@dataclass
class Foo:
x: str
foo = Foo(x="some-string")
foo_json = foo.to_json()
It also supports embedded dataclasses – if your dataclass has a field typed as another dataclass – if all dataclasses envolved have the @dataclass_json
decorator.
Ways of getting JSONified dataclass instance
There are couple of options to accomplish that goal, selection of each imply analyze on which approach suits best for your needs:
Standard library: dataclass.asdict
import dataclasses
import json
@dataclass.dataclass
class Foo:
x: str
foo = Foo(x='1')
json_foo = json.dumps(dataclasses.asdict(foo)) # '{"x": "1"}'
Picking it back to dataclass instance isn’t trivial, so you may want to visit that answer https://stackoverflow.com/a/53498623/2067976
Marshmallow Dataclass
from dataclasses import field
from marshmallow_dataclass import dataclass
@dataclass
class Foo:
x: int = field(metadata={"required": True})
foo = Foo(x='1') # Foo(x='1')
json_foo = foo.Schema().dumps(foo) # '{"x": "1"}'
# Back to class instance.
Foo.Schema().loads(json_foo) # Foo(x=1)
As a bonus for marshmallow_dataclass
you may use validation on the field itself, that validation will be used when someone deserialize the object from json using that schema.
Dataclasses Json
from dataclasses import dataclass
from dataclasses_json import dataclass_json
@dataclass_json
@dataclass
class Foo:
x: int
foo = Foo(x='1')
json_foo = foo.to_json() # Foo(x='1')
# Back to class instance
Foo.from_json(json_foo) # Foo(x='1')
Also, in addition to that notice that marshmallow dataclass did type conversion for you whereas dataclassses-json(ver.: 0.5.1) ignores that.
Write Custom Encoder
Follow accepted miracle2k answer and reuse custom json encoder.
A much simpler answer can be found on Reddit using dictionary unpacking
>>> from dataclasses import dataclass
>>> @dataclass
... class MyData:
... prop1: int
... prop2: str
... prop3: int
...
>>> d = {'prop1': 5, 'prop2': 'hi', 'prop3': 100}
>>> my_data = MyData(**d)
>>> my_data
MyData(prop1=5, prop2='hi', prop3=100)
I’d suggest creating a parent class for your dataclasses with a to_json()
method:
import json
from dataclasses import dataclass, asdict
@dataclass
class Dataclass:
def to_json(self) -> str:
return json.dumps(asdict(self))
@dataclass
class YourDataclass(Dataclass):
a: int
b: int
x = YourDataclass(a=1, b=2)
x.to_json() # '{"a": 1, "b": 2}'
This is especially useful if you have other functionality to add to all your dataclasses.
Okay so here is what I did when I was in similar situation.
-
Create a custom dictionary factory that converts nested data classes into dictionary.
def myfactory(data):
return dict(x for x in data if x[1] is not None)
-
If foo is your @dataclass, then simply provide your dictionary factory to use "myfactory()" method:
fooDict = asdict(foo, dict_factory=myfactory)
-
Convert fooDict to json
fooJson = json.dumps(fooDict)
This should work !!
dataclass-wizard is a modern option that can work for you. It supports complex types such as date and time, most generics from the typing
module, and also a nested dataclass structure.
The "new style" annotations introduced in PEPs 585 and 604 can be ported back to Python 3.7 via a __future__
import as shown below.
from __future__ import annotations # This can be removed in Python 3.10
from dataclasses import dataclass, field
from dataclass_wizard import JSONWizard
@dataclass
class MyClass(JSONWizard):
my_str: str | None
is_active_tuple: tuple[bool, ...]
list_of_int: list[int] = field(default_factory=list)
string = """
{
"my_str": 20,
"ListOfInt": ["1", "2", 3],
"isActiveTuple": ["true", false, 1]
}
"""
instance = MyClass.from_json(string)
print(repr(instance))
# MyClass(my_str='20', is_active_tuple=(True, False, True), list_of_int=[1, 2, 3])
print(instance.to_json())
# '{"myStr": "20", "isActiveTuple": [true, false, true], "listOfInt": [1, 2, 3]}'
# True
assert instance == MyClass.from_json(instance.to_json())
You can install the Dataclass Wizard with pip
:
$ pip install dataclass-wizard
A bit of background info:
For serialization, it uses a slightly modified (a bit more efficient) implementation of dataclasses.asdict
. When de-serializing JSON to a dataclass instance, the first time it iterates over the dataclass fields and generates a parser for each annotated type, which makes it more efficient when the de-serialization process is run multiple times.
Disclaimer: I am the creator (and maintainer) of this library.
The simplest way to encode dataclass
and SimpleNamespace
objects is to provide the default function to json.dumps()
that gets called for objects that can’t be otherwise serialized, and return the object __dict__
:
json.dumps(foo, default=lambda o: o.__dict__)
You can also implement the asdict
and json.dumps
method within the class. In this case it wouldn’t be necessary to import json.dumps
into other parts of your project:
from typing import List
from dataclasses import dataclass, asdict, field
from json import dumps
@dataclass
class TestDataClass:
"""
Data Class for TestDataClass
"""
id: int
name: str
tested: bool = False
test_list: List[str] = field(default_factory=list)
@property
def __dict__(self):
"""
get a python dictionary
"""
return asdict(self)
@property
def json(self):
"""
get the json formated string
"""
return dumps(self.__dict__)
test_object_1 = TestDataClass(id=1, name="Hi")
print(test_object_1.__dict__)
print(test_object_1.json)
Output:
{'id': 1, 'name': 'Hi', 'tested': False, 'test_list': []}
{"id": 1, "name": "Hi", "tested": false, "test_list": []}
You can also create a parent class to inherit the methods:
from typing import List
from dataclasses import dataclass, asdict, field
from json import dumps
@dataclass
class SuperTestDataClass:
@property
def __dict__(self):
"""
get a python dictionary
"""
return asdict(self)
@property
def json(self):
"""
get the json formated string
"""
return dumps(self.__dict__)
@dataclass
class TestDataClass(SuperTestDataClass):
"""
Data Class for TestDataClass
"""
id: int
name: str
tested: bool = False
test_list: List[str] = field(default_factory=list)
test_object_1 = TestDataClass(id=1, name="Hi")
print(test_object_1.__dict__)
print(test_object_1.json)
A dataclass providing json formating method
import json
from dataclasses import dataclass
@dataclass
class Foo:
x: str
def to_json(self):
return json.dumps(self.__dict__)
Foo("bar").to_json()
>>> '{"x":"bar"}'
pydantic
With pydantic models you get a dataclasses-like experience and full support for dict and Json conversions (and much more).
Python 3.9 and above:
from typing import Optional
from pydantic import BaseModel, parse_obj_as, parse_raw_as
class Foo(BaseModel):
count: int
size: Optional[float] = None
f1 = Foo(count=10)
print(f1.dict()) # Parse to dict
# > {'count': 10, 'size': None}
f2 = Foo.parse_obj({"count": 20}) # Load from dict
print(f2.json()) # Parse to json
# > {"count": 20, "size": null}
More options:
f3 = Foo.parse_raw('{"count": 30}') # Load from json string
f4 = Foo.parse_file("path/to/data.json") # Load from json file
f_list1 = parse_obj_as(list[Foo], [{"count": 110}, {"count": 120}]) # Load from list of dicts
print(f_list1)
# > [Foo(count=110, size=None), Foo(count=120, size=None)]
f_list2 = parse_raw_as(list[Foo], '[{"count": 130}, {"count": 140}]') # Load from list in json string
print(f_list2)
# > [Foo(count=130, size=None), Foo(count=140, size=None)]
Complex hierarchical data structures
class Bar(BaseModel):
apple = "x"
banana = "y"
class Spam(BaseModel):
foo: Foo
bars: list[Bar]
m = Spam(foo={"count": 4}, bars=[{"apple": "x1"}, {"apple": "x2"}])
print(m)
# > foo=Foo(count=4, size=None) bars=[Bar(apple='x1', banana='y'), Bar(apple='x2', banana='y')]
print(m.dict())
"""
{
'foo': {'count': 4, 'size': None},
'bars': [
{'apple': 'x1', 'banana': 'y'},
{'apple': 'x2', 'banana': 'y'},
],
}
"""
Pydantic supports many standard types (like datetime
) and special commonly used types (like EmailStr
and HttpUrl
):
from datetime import datetime
from pydantic import HttpUrl
class User(BaseModel):
name = "John Doe"
signup_ts: datetime = None
url: HttpUrl = None
u1 = User(signup_ts="2017-07-14 00:00:00")
print(u1)
# > signup_ts=datetime.datetime(2017, 7, 14, 0, 0) url=None name='John Doe'
u2 = User(url="http://example.com")
print(u2)
# > signup_ts=None url=HttpUrl('http://example.com', ) name='John Doe'
u3 = User(url="ht://example.com")
"""
ValidationError: 1 validation error for User
url
URL scheme not permitted (type=value_error.url.scheme; allowed_schemes={'http', 'https'})
"""
If you really need to use json.dumps, write a Custom Encoder:
import json
class EnhancedJSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, BaseModel):
return o.dict()
return super().default(o)
json.dumps([{"foo": f2}], cls=EnhancedJSONEncoder)
# > '[{"foo": {"count": 20, "size": null}}]'
Starting with Python 3.7, there is something called a dataclass:
from dataclasses import dataclass
@dataclass
class Foo:
x: str
However, the following fails:
>>> import json
>>> foo = Foo(x="bar")
>>> json.dumps(foo)
TypeError: Object of type Foo is not JSON serializable
How can I make json.dumps()
encode instances of Foo
into json objects?
Much like you can add support to the JSON encoder for datetime
objects or Decimals, you can also provide a custom encoder subclass to serialize dataclasses:
import dataclasses, json
class EnhancedJSONEncoder(json.JSONEncoder):
def default(self, o):
if dataclasses.is_dataclass(o):
return dataclasses.asdict(o)
return super().default(o)
json.dumps(foo, cls=EnhancedJSONEncoder)
Can’t you just use the dataclasses.asdict()
function to convert the dataclass
to a dict? Something like:
>>> @dataclass
... class Foo:
... a: int
... b: int
...
>>> x = Foo(1,2)
>>> json.dumps(dataclasses.asdict(x))
'{"a": 1, "b": 2}'
If you are ok with using a library for that, you can use dataclasses-json. Here is an example:
from dataclasses import dataclass
from dataclasses_json import dataclass_json
@dataclass_json
@dataclass
class Foo:
x: str
foo = Foo(x="some-string")
foo_json = foo.to_json()
It also supports embedded dataclasses – if your dataclass has a field typed as another dataclass – if all dataclasses envolved have the @dataclass_json
decorator.
Ways of getting JSONified dataclass instance
There are couple of options to accomplish that goal, selection of each imply analyze on which approach suits best for your needs:
Standard library: dataclass.asdict
import dataclasses
import json
@dataclass.dataclass
class Foo:
x: str
foo = Foo(x='1')
json_foo = json.dumps(dataclasses.asdict(foo)) # '{"x": "1"}'
Picking it back to dataclass instance isn’t trivial, so you may want to visit that answer https://stackoverflow.com/a/53498623/2067976
Marshmallow Dataclass
from dataclasses import field
from marshmallow_dataclass import dataclass
@dataclass
class Foo:
x: int = field(metadata={"required": True})
foo = Foo(x='1') # Foo(x='1')
json_foo = foo.Schema().dumps(foo) # '{"x": "1"}'
# Back to class instance.
Foo.Schema().loads(json_foo) # Foo(x=1)
As a bonus for marshmallow_dataclass
you may use validation on the field itself, that validation will be used when someone deserialize the object from json using that schema.
Dataclasses Json
from dataclasses import dataclass
from dataclasses_json import dataclass_json
@dataclass_json
@dataclass
class Foo:
x: int
foo = Foo(x='1')
json_foo = foo.to_json() # Foo(x='1')
# Back to class instance
Foo.from_json(json_foo) # Foo(x='1')
Also, in addition to that notice that marshmallow dataclass did type conversion for you whereas dataclassses-json(ver.: 0.5.1) ignores that.
Write Custom Encoder
Follow accepted miracle2k answer and reuse custom json encoder.
A much simpler answer can be found on Reddit using dictionary unpacking
>>> from dataclasses import dataclass
>>> @dataclass
... class MyData:
... prop1: int
... prop2: str
... prop3: int
...
>>> d = {'prop1': 5, 'prop2': 'hi', 'prop3': 100}
>>> my_data = MyData(**d)
>>> my_data
MyData(prop1=5, prop2='hi', prop3=100)
I’d suggest creating a parent class for your dataclasses with a to_json()
method:
import json
from dataclasses import dataclass, asdict
@dataclass
class Dataclass:
def to_json(self) -> str:
return json.dumps(asdict(self))
@dataclass
class YourDataclass(Dataclass):
a: int
b: int
x = YourDataclass(a=1, b=2)
x.to_json() # '{"a": 1, "b": 2}'
This is especially useful if you have other functionality to add to all your dataclasses.
Okay so here is what I did when I was in similar situation.
-
Create a custom dictionary factory that converts nested data classes into dictionary.
def myfactory(data):
return dict(x for x in data if x[1] is not None) -
If foo is your @dataclass, then simply provide your dictionary factory to use "myfactory()" method:
fooDict = asdict(foo, dict_factory=myfactory)
-
Convert fooDict to json
fooJson = json.dumps(fooDict)
This should work !!
dataclass-wizard is a modern option that can work for you. It supports complex types such as date and time, most generics from the typing
module, and also a nested dataclass structure.
The "new style" annotations introduced in PEPs 585 and 604 can be ported back to Python 3.7 via a __future__
import as shown below.
from __future__ import annotations # This can be removed in Python 3.10
from dataclasses import dataclass, field
from dataclass_wizard import JSONWizard
@dataclass
class MyClass(JSONWizard):
my_str: str | None
is_active_tuple: tuple[bool, ...]
list_of_int: list[int] = field(default_factory=list)
string = """
{
"my_str": 20,
"ListOfInt": ["1", "2", 3],
"isActiveTuple": ["true", false, 1]
}
"""
instance = MyClass.from_json(string)
print(repr(instance))
# MyClass(my_str='20', is_active_tuple=(True, False, True), list_of_int=[1, 2, 3])
print(instance.to_json())
# '{"myStr": "20", "isActiveTuple": [true, false, true], "listOfInt": [1, 2, 3]}'
# True
assert instance == MyClass.from_json(instance.to_json())
You can install the Dataclass Wizard with pip
:
$ pip install dataclass-wizard
A bit of background info:
For serialization, it uses a slightly modified (a bit more efficient) implementation of
dataclasses.asdict
. When de-serializing JSON to a dataclass instance, the first time it iterates over the dataclass fields and generates a parser for each annotated type, which makes it more efficient when the de-serialization process is run multiple times.
Disclaimer: I am the creator (and maintainer) of this library.
The simplest way to encode dataclass
and SimpleNamespace
objects is to provide the default function to json.dumps()
that gets called for objects that can’t be otherwise serialized, and return the object __dict__
:
json.dumps(foo, default=lambda o: o.__dict__)
You can also implement the asdict
and json.dumps
method within the class. In this case it wouldn’t be necessary to import json.dumps
into other parts of your project:
from typing import List
from dataclasses import dataclass, asdict, field
from json import dumps
@dataclass
class TestDataClass:
"""
Data Class for TestDataClass
"""
id: int
name: str
tested: bool = False
test_list: List[str] = field(default_factory=list)
@property
def __dict__(self):
"""
get a python dictionary
"""
return asdict(self)
@property
def json(self):
"""
get the json formated string
"""
return dumps(self.__dict__)
test_object_1 = TestDataClass(id=1, name="Hi")
print(test_object_1.__dict__)
print(test_object_1.json)
Output:
{'id': 1, 'name': 'Hi', 'tested': False, 'test_list': []}
{"id": 1, "name": "Hi", "tested": false, "test_list": []}
You can also create a parent class to inherit the methods:
from typing import List
from dataclasses import dataclass, asdict, field
from json import dumps
@dataclass
class SuperTestDataClass:
@property
def __dict__(self):
"""
get a python dictionary
"""
return asdict(self)
@property
def json(self):
"""
get the json formated string
"""
return dumps(self.__dict__)
@dataclass
class TestDataClass(SuperTestDataClass):
"""
Data Class for TestDataClass
"""
id: int
name: str
tested: bool = False
test_list: List[str] = field(default_factory=list)
test_object_1 = TestDataClass(id=1, name="Hi")
print(test_object_1.__dict__)
print(test_object_1.json)
A dataclass providing json formating method
import json
from dataclasses import dataclass
@dataclass
class Foo:
x: str
def to_json(self):
return json.dumps(self.__dict__)
Foo("bar").to_json()
>>> '{"x":"bar"}'
pydantic
With pydantic models you get a dataclasses-like experience and full support for dict and Json conversions (and much more).
Python 3.9 and above:
from typing import Optional
from pydantic import BaseModel, parse_obj_as, parse_raw_as
class Foo(BaseModel):
count: int
size: Optional[float] = None
f1 = Foo(count=10)
print(f1.dict()) # Parse to dict
# > {'count': 10, 'size': None}
f2 = Foo.parse_obj({"count": 20}) # Load from dict
print(f2.json()) # Parse to json
# > {"count": 20, "size": null}
More options:
f3 = Foo.parse_raw('{"count": 30}') # Load from json string
f4 = Foo.parse_file("path/to/data.json") # Load from json file
f_list1 = parse_obj_as(list[Foo], [{"count": 110}, {"count": 120}]) # Load from list of dicts
print(f_list1)
# > [Foo(count=110, size=None), Foo(count=120, size=None)]
f_list2 = parse_raw_as(list[Foo], '[{"count": 130}, {"count": 140}]') # Load from list in json string
print(f_list2)
# > [Foo(count=130, size=None), Foo(count=140, size=None)]
Complex hierarchical data structures
class Bar(BaseModel):
apple = "x"
banana = "y"
class Spam(BaseModel):
foo: Foo
bars: list[Bar]
m = Spam(foo={"count": 4}, bars=[{"apple": "x1"}, {"apple": "x2"}])
print(m)
# > foo=Foo(count=4, size=None) bars=[Bar(apple='x1', banana='y'), Bar(apple='x2', banana='y')]
print(m.dict())
"""
{
'foo': {'count': 4, 'size': None},
'bars': [
{'apple': 'x1', 'banana': 'y'},
{'apple': 'x2', 'banana': 'y'},
],
}
"""
Pydantic supports many standard types (like datetime
) and special commonly used types (like EmailStr
and HttpUrl
):
from datetime import datetime
from pydantic import HttpUrl
class User(BaseModel):
name = "John Doe"
signup_ts: datetime = None
url: HttpUrl = None
u1 = User(signup_ts="2017-07-14 00:00:00")
print(u1)
# > signup_ts=datetime.datetime(2017, 7, 14, 0, 0) url=None name='John Doe'
u2 = User(url="http://example.com")
print(u2)
# > signup_ts=None url=HttpUrl('http://example.com', ) name='John Doe'
u3 = User(url="ht://example.com")
"""
ValidationError: 1 validation error for User
url
URL scheme not permitted (type=value_error.url.scheme; allowed_schemes={'http', 'https'})
"""
If you really need to use json.dumps, write a Custom Encoder:
import json
class EnhancedJSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, BaseModel):
return o.dict()
return super().default(o)
json.dumps([{"foo": f2}], cls=EnhancedJSONEncoder)
# > '[{"foo": {"count": 20, "size": null}}]'