How do I convert a json file to a python class?
Question:
Consider this json file named h.json
I want to convert this into a python dataclass.
{
"acc1":{
"email":"[email protected]",
"password":"acc1",
"name":"ACC1",
"salary":1
},
"acc2":{
"email":"[email protected]",
"password":"acc2",
"name":"ACC2",
"salary":2
}
}
I could use an alternative constructor for getting each account, for example:
import json
from dataclasses import dataclass
@dataclass
class Account(object):
email:str
password:str
name:str
salary:int
@classmethod
def from_json(cls, json_key):
file = json.load(open("h.json"))
return cls(**file[json_key])
but this is limited to what arguments (email, name, etc.) were defined in the dataclass.
What if I were to modify the json to include another thing, say age?
The script would end up returning a TypeError
, specifically TypeError: __init__() got an unexpected keyword argument 'age'
.
Is there a way to dynamically adjust the class attributes based on the keys of the dict (json object), so that I don’t have to add attributes each time I add a new key to the json?
Answers:
For a flat (not nested dataclass) the code below does the job.
If you need to handle nested dataclasses you should use a framework like dacite
.
Note 1 that loading the data from the json file should not be part of your class logic.
Note 2 If your json can contain anything – you can not map it to a dataclass and you should have to work with a dict
from dataclasses import dataclass
from typing import List
data = {
"acc1":{
"email":"[email protected]",
"password":"acc1",
"name":"ACC1",
"salary":1
},
"acc2":{
"email":"[email protected]",
"password":"acc2",
"name":"ACC2",
"salary":2
}
}
@dataclass
class Account:
email:str
password:str
name:str
salary:int
accounts: List[Account] = [Account(**x) for x in data.values()]
print(accounts)
output
[Account(email='[email protected]', password='acc1', name='ACC1', salary=1), Account(email='[email protected]', password='acc2', name='ACC2', salary=2)]
This way you lose some dataclass
features.
- Such as determining whether it is
optional
or not
- Such as auto-completion feature
However, you are more familiar with your project and decide accordingly
There must be many methods, but this is one of them:
@dataclass
class Account(object):
email: str
password: str
name: str
salary: int
@classmethod
def from_json(cls, json_key):
file = json.load(open("1.txt"))
keys = [f.name for f in fields(cls)]
# or: keys = cls.__dataclass_fields__.keys()
json_data = file[json_key]
normal_json_data = {key: json_data[key] for key in json_data if key in keys}
anormal_json_data = {key: json_data[key] for key in json_data if key not in keys}
tmp = cls(**normal_json_data)
for anormal_key in anormal_json_data:
setattr(tmp,anormal_key,anormal_json_data[anormal_key])
return tmp
test = Account.from_json("acc1")
print(test.age)
Since it sounds like your data might be expected to be dynamic and you want the freedom to add more fields in the JSON object without reflecting the same changes in the model, I’d also suggest to check out typing.TypedDict
instead a dataclass
.
Here’s an example with TypedDict
, which should work in Python 3.7+. Since TypedDict was introduced in 3.8, I’ve instead imported it from typing_extensions
so it’s compatible with 3.7 code.
from __future__ import annotations
import json
from io import StringIO
from typing_extensions import TypedDict
class Account(TypedDict):
email: str
password: str
name: str
salary: int
json_data = StringIO("""{
"acc1":{
"email":"[email protected]",
"password":"acc1",
"name":"ACC1",
"salary":1
},
"acc2":{
"email":"[email protected]",
"password":"acc2",
"name":"ACC2",
"salary":2,
"someRandomKey": "string"
}
}
""")
data = json.load(json_data)
name_to_account: dict[str, Account] = data
acct = name_to_account['acc2']
# Your IDE should be able to offer auto-complete suggestions within the
# brackets, when you start typing or press 'Ctrl + Space' for example.
print(acct['someRandomKey'])
If you are set on using dataclasses to model your data, I’d suggest checking out a JSON serialization library like the dataclass-wizard (disclaimer: I am the creator) which should handle extraneous fields in the JSON data as mentioned, as well as a nested dataclass model if you find your data becoming more complex.
It also has a handy tool that you can use to generate a dataclass schema from JSON data, which can be useful for instance if you want to update your model class whenever you add new fields in the JSON file as mentioned.
Consider this json file named h.json
I want to convert this into a python dataclass.
{
"acc1":{
"email":"[email protected]",
"password":"acc1",
"name":"ACC1",
"salary":1
},
"acc2":{
"email":"[email protected]",
"password":"acc2",
"name":"ACC2",
"salary":2
}
}
I could use an alternative constructor for getting each account, for example:
import json
from dataclasses import dataclass
@dataclass
class Account(object):
email:str
password:str
name:str
salary:int
@classmethod
def from_json(cls, json_key):
file = json.load(open("h.json"))
return cls(**file[json_key])
but this is limited to what arguments (email, name, etc.) were defined in the dataclass.
What if I were to modify the json to include another thing, say age?
The script would end up returning a TypeError
, specifically TypeError: __init__() got an unexpected keyword argument 'age'
.
Is there a way to dynamically adjust the class attributes based on the keys of the dict (json object), so that I don’t have to add attributes each time I add a new key to the json?
For a flat (not nested dataclass) the code below does the job.
If you need to handle nested dataclasses you should use a framework like dacite
.
Note 1 that loading the data from the json file should not be part of your class logic.
Note 2 If your json can contain anything – you can not map it to a dataclass and you should have to work with a dict
from dataclasses import dataclass
from typing import List
data = {
"acc1":{
"email":"[email protected]",
"password":"acc1",
"name":"ACC1",
"salary":1
},
"acc2":{
"email":"[email protected]",
"password":"acc2",
"name":"ACC2",
"salary":2
}
}
@dataclass
class Account:
email:str
password:str
name:str
salary:int
accounts: List[Account] = [Account(**x) for x in data.values()]
print(accounts)
output
[Account(email='[email protected]', password='acc1', name='ACC1', salary=1), Account(email='[email protected]', password='acc2', name='ACC2', salary=2)]
This way you lose some dataclass
features.
- Such as determining whether it is
optional
or not - Such as auto-completion feature
However, you are more familiar with your project and decide accordingly
There must be many methods, but this is one of them:
@dataclass
class Account(object):
email: str
password: str
name: str
salary: int
@classmethod
def from_json(cls, json_key):
file = json.load(open("1.txt"))
keys = [f.name for f in fields(cls)]
# or: keys = cls.__dataclass_fields__.keys()
json_data = file[json_key]
normal_json_data = {key: json_data[key] for key in json_data if key in keys}
anormal_json_data = {key: json_data[key] for key in json_data if key not in keys}
tmp = cls(**normal_json_data)
for anormal_key in anormal_json_data:
setattr(tmp,anormal_key,anormal_json_data[anormal_key])
return tmp
test = Account.from_json("acc1")
print(test.age)
Since it sounds like your data might be expected to be dynamic and you want the freedom to add more fields in the JSON object without reflecting the same changes in the model, I’d also suggest to check out typing.TypedDict
instead a dataclass
.
Here’s an example with TypedDict
, which should work in Python 3.7+. Since TypedDict was introduced in 3.8, I’ve instead imported it from typing_extensions
so it’s compatible with 3.7 code.
from __future__ import annotations
import json
from io import StringIO
from typing_extensions import TypedDict
class Account(TypedDict):
email: str
password: str
name: str
salary: int
json_data = StringIO("""{
"acc1":{
"email":"[email protected]",
"password":"acc1",
"name":"ACC1",
"salary":1
},
"acc2":{
"email":"[email protected]",
"password":"acc2",
"name":"ACC2",
"salary":2,
"someRandomKey": "string"
}
}
""")
data = json.load(json_data)
name_to_account: dict[str, Account] = data
acct = name_to_account['acc2']
# Your IDE should be able to offer auto-complete suggestions within the
# brackets, when you start typing or press 'Ctrl + Space' for example.
print(acct['someRandomKey'])
If you are set on using dataclasses to model your data, I’d suggest checking out a JSON serialization library like the dataclass-wizard (disclaimer: I am the creator) which should handle extraneous fields in the JSON data as mentioned, as well as a nested dataclass model if you find your data becoming more complex.
It also has a handy tool that you can use to generate a dataclass schema from JSON data, which can be useful for instance if you want to update your model class whenever you add new fields in the JSON file as mentioned.