Python dataclass, what's a pythonic way to validate initialization arguments?
Question:
What’s a pythonic way to validate the init arguments before instantiation w/o overriding dataclasses built-in init?
I thought perhaps leveraging the __new__
dunder-method would be appropriate?
from dataclasses import dataclass
@dataclass
class MyClass:
is_good: bool = False
is_bad: bool = False
def __new__(cls, *args, **kwargs):
instance: cls = super(MyClass, cls).__new__(cls, *args, **kwargs)
if instance.is_good:
assert not instance.is_bad
return instance
Answers:
Define a __post_init__
method on the class; the generated __init__
will call it if defined:
from dataclasses import dataclass
@dataclass
class MyClass:
is_good: bool = False
is_bad: bool = False
def __post_init__(self):
if self.is_good:
assert not self.is_bad
This will even work when the replace
function is used to make a new instance.
The author of the dataclasses
module made a conscious decision to not implement validators that are present in similar third party projects like attrs
, pydantic
, or marshmallow
. And if your actual problem is within the scope of the one you posted, then doing the validation in the __post_init__
is completely fine.
But if you have more complex validation procedures or play with stuff like inheritance you might want to use one of the more powerful libraries I mentioned instead of dataclass
. Just to have something to look at, this is what your example could look like using pydantic
:
>>> from pydantic import BaseModel, validator
>>> class MyClass(BaseModel):
... is_good: bool = False
... is_bad: bool = False
...
... @validator('is_bad')
... def check_something(cls, v, values):
... if values['is_good'] and v:
... raise ValueError("Can not be both good and bad now, can it?")
... return v
...
>>> MyClass(is_good=True, is_bad=False) # this would be a valid instance
MyClass(is_good=True, is_bad=False)
>>> MyClass(is_good=True, is_bad=True) # this wouldn't
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "pydantic/main.py", line 283, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for MyClass
is_bad
Can not be both good and bad now, can it? (type=value_error)
You can try this:
from dataclasses import dataclass
from validated_dc import ValidatedDC
@dataclass
class MyClass(ValidatedDC):
is_good: bool = False
is_bad: bool = False
instance = MyClass()
assert instance.get_errors() is None
assert instance == MyClass(is_good=False, is_bad=False)
data = {'is_good': True, 'is_bad': True}
instance = MyClass(**data)
assert instance.get_errors() is None
data = {'is_good': 'bad_value', 'is_bad': True}
instance = MyClass(**data)
assert instance.get_errors()
print(instance.get_errors())
# {'is_good': [BasicValidationError(value_repr='bad_value', value_type=<class 'str'>, annotation=<class 'bool'>, exception=None)]}
# fix
instance.is_good = True
assert instance.is_valid()
assert instance.get_errors() is None
ValidatedDC: https://github.com/EvgeniyBurdin/validated_dc
What’s a pythonic way to validate the init arguments before instantiation w/o overriding dataclasses built-in init?
I thought perhaps leveraging the __new__
dunder-method would be appropriate?
from dataclasses import dataclass
@dataclass
class MyClass:
is_good: bool = False
is_bad: bool = False
def __new__(cls, *args, **kwargs):
instance: cls = super(MyClass, cls).__new__(cls, *args, **kwargs)
if instance.is_good:
assert not instance.is_bad
return instance
Define a __post_init__
method on the class; the generated __init__
will call it if defined:
from dataclasses import dataclass
@dataclass
class MyClass:
is_good: bool = False
is_bad: bool = False
def __post_init__(self):
if self.is_good:
assert not self.is_bad
This will even work when the replace
function is used to make a new instance.
The author of the dataclasses
module made a conscious decision to not implement validators that are present in similar third party projects like attrs
, pydantic
, or marshmallow
. And if your actual problem is within the scope of the one you posted, then doing the validation in the __post_init__
is completely fine.
But if you have more complex validation procedures or play with stuff like inheritance you might want to use one of the more powerful libraries I mentioned instead of dataclass
. Just to have something to look at, this is what your example could look like using pydantic
:
>>> from pydantic import BaseModel, validator
>>> class MyClass(BaseModel):
... is_good: bool = False
... is_bad: bool = False
...
... @validator('is_bad')
... def check_something(cls, v, values):
... if values['is_good'] and v:
... raise ValueError("Can not be both good and bad now, can it?")
... return v
...
>>> MyClass(is_good=True, is_bad=False) # this would be a valid instance
MyClass(is_good=True, is_bad=False)
>>> MyClass(is_good=True, is_bad=True) # this wouldn't
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "pydantic/main.py", line 283, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for MyClass
is_bad
Can not be both good and bad now, can it? (type=value_error)
You can try this:
from dataclasses import dataclass
from validated_dc import ValidatedDC
@dataclass
class MyClass(ValidatedDC):
is_good: bool = False
is_bad: bool = False
instance = MyClass()
assert instance.get_errors() is None
assert instance == MyClass(is_good=False, is_bad=False)
data = {'is_good': True, 'is_bad': True}
instance = MyClass(**data)
assert instance.get_errors() is None
data = {'is_good': 'bad_value', 'is_bad': True}
instance = MyClass(**data)
assert instance.get_errors()
print(instance.get_errors())
# {'is_good': [BasicValidationError(value_repr='bad_value', value_type=<class 'str'>, annotation=<class 'bool'>, exception=None)]}
# fix
instance.is_good = True
assert instance.is_valid()
assert instance.get_errors() is None
ValidatedDC: https://github.com/EvgeniyBurdin/validated_dc