Why does the pydantic dataclass cast a list to a dict? How to prevent this behavior?
Question:
I am a bit confused by the behavior of the pydantic dataclass.
Why does the dict
type accept a list
of a dict
as valid dict
and why is it converted it to a dict
of the keys?
Am i doing something wrong? Is this some kind of intended behavior and if so is there a way to prevent that behavior?
Code Example:
from pydantic.dataclasses import dataclass
@dataclass
class X:
y: dict
print(X([{'a':'b', 'c':'d'}]))
Output:
X(y={'a': 'c'})
Answers:
Hah, this is kind of amusing actually. Bear with me…
Why does the dict
type accept a list
of a dict
as valid dict
and why is it converted it to a dict
of the keys?
This can be explained when we take a closer look at the default dict_validator
used by Pydantic models. The very first thing it does (to any non-dict
) is attempt to coerce the value to a dict
.
Try this with your specific example:
y = [{'a': 'b', 'c': 'd'}]
assert dict(y) == {'a': 'c'}
Why is that?
Well, to initialize a dict
, you can pass different kinds of arguments. One option is to pass some Iterable
and
Each item in the iterable must itself be an iterable with exactly two objects. The first object of each item becomes a key in the new dictionary, and the second object the corresponding value.
In your example, you just so happened to have an iterable (specifically a list
) and the only item in that iterable is itself an iterable (specifically a dict
). And how are dictionaries iterated over by default? Via their keys! Since that dictionary {'a': 'b', 'c': 'd'}
has exactly two key-value-pairs, that means when iterated over it produces those two keys, i.e. "a"
and "c"
:
d = {'a': 'b', 'c': 'd'}
assert tuple(iter(d)) == ('a', 'c')
It is this mechanism that allows a dict
to be constructed for example from a list of 2-tuples like so:
data = [('a', 1), ('b', 2)]
assert dict(data) == {'a': 1, 'b': 2}
In your case this leads to the result you showed, which at first glance seems strange and unexpected, but actually makes sense, when you think about the logic of dictionary initialization.
What is funny is that this only works when the dict
in the list
has exactly two key-value pairs! Anything more or less will lead to an error. (Try it yourself.)
So in short: This behavior is neither special to Pydantic nor dataclasses, but is the result of regular dict
initialization.
Am i doing something wrong?
I would say, yes. The value you are trying to assign to X.y
is a list
, but you declared it to be a dict
. So that is obviously wrong. I get that sometimes data comes from external sources, so it may not be up to you.
Is this some kind of intended behavior […]?
That is a good question in the sense that I am curious to know if the Pydantic team is aware of this edge case and the strange result it causes. I would say it is at least understandable that the dictionary validator was implemented the way it was.
is there a way to prevent that behavior?
Yes. Aside from the obvious solution of just not passing a list there.
You could add your own custom validator, configure it with pre=True
and have it for example only allow actual straight up dict
instances to proceed to further validation. Then you could catch this error immediately.
Hope this helps.
Thanks for shining a light on this because this would have thrown me off at first, too. I think I’ll start digging through the Pydantic issue tracker and PRs and see if this may/should/will be addressed somehow.
PS
Here is very simple implementation of the aforementioned "strict" validator that prevents dict
-coercion and instead raises an error for non-dict
immediately:
from typing import Any
from pydantic.class_validators import validator
from pydantic.dataclasses import dataclass
from pydantic.fields import ModelField, SHAPE_DICT, SHAPE_SINGLETON
@dataclass
class X:
y: dict
@validator("*", pre=True)
def strict_dict(cls, v: Any, field: ModelField) -> Any:
declared_dict_type = (
field.type_ is dict and field.shape is SHAPE_SINGLETON
or field.shape is SHAPE_DICT
)
if declared_dict_type and not isinstance(v, dict):
raise TypeError(f"value must be a `dict`, got {type(v)}")
return v
if __name__ == '__main__':
print(X([{'a': 'b', 'c': 'd'}]))
Output:
Traceback (most recent call last):
File "....py", line 24, in <module>
print(X([{'a': 'b', 'c': 'd'}]))
File "pydantic/dataclasses.py", line 313, in pydantic.dataclasses._add_pydantic_validation_attributes.new_init
File "pydantic/dataclasses.py", line 416, in pydantic.dataclasses._dataclass_validate_values
# worries about external callers.
pydantic.error_wrappers.ValidationError: 1 validation error for X
y
value must be a `dict`, got <class 'list'> (type=type_error)
I am a bit confused by the behavior of the pydantic dataclass.
Why does the dict
type accept a list
of a dict
as valid dict
and why is it converted it to a dict
of the keys?
Am i doing something wrong? Is this some kind of intended behavior and if so is there a way to prevent that behavior?
Code Example:
from pydantic.dataclasses import dataclass
@dataclass
class X:
y: dict
print(X([{'a':'b', 'c':'d'}]))
Output:
X(y={'a': 'c'})
Hah, this is kind of amusing actually. Bear with me…
Why does the
dict
type accept alist
of adict
as validdict
and why is it converted it to adict
of the keys?
This can be explained when we take a closer look at the default dict_validator
used by Pydantic models. The very first thing it does (to any non-dict
) is attempt to coerce the value to a dict
.
Try this with your specific example:
y = [{'a': 'b', 'c': 'd'}]
assert dict(y) == {'a': 'c'}
Why is that?
Well, to initialize a dict
, you can pass different kinds of arguments. One option is to pass some Iterable
and
Each item in the iterable must itself be an iterable with exactly two objects. The first object of each item becomes a key in the new dictionary, and the second object the corresponding value.
In your example, you just so happened to have an iterable (specifically a list
) and the only item in that iterable is itself an iterable (specifically a dict
). And how are dictionaries iterated over by default? Via their keys! Since that dictionary {'a': 'b', 'c': 'd'}
has exactly two key-value-pairs, that means when iterated over it produces those two keys, i.e. "a"
and "c"
:
d = {'a': 'b', 'c': 'd'}
assert tuple(iter(d)) == ('a', 'c')
It is this mechanism that allows a dict
to be constructed for example from a list of 2-tuples like so:
data = [('a', 1), ('b', 2)]
assert dict(data) == {'a': 1, 'b': 2}
In your case this leads to the result you showed, which at first glance seems strange and unexpected, but actually makes sense, when you think about the logic of dictionary initialization.
What is funny is that this only works when the dict
in the list
has exactly two key-value pairs! Anything more or less will lead to an error. (Try it yourself.)
So in short: This behavior is neither special to Pydantic nor dataclasses, but is the result of regular dict
initialization.
Am i doing something wrong?
I would say, yes. The value you are trying to assign to X.y
is a list
, but you declared it to be a dict
. So that is obviously wrong. I get that sometimes data comes from external sources, so it may not be up to you.
Is this some kind of intended behavior […]?
That is a good question in the sense that I am curious to know if the Pydantic team is aware of this edge case and the strange result it causes. I would say it is at least understandable that the dictionary validator was implemented the way it was.
is there a way to prevent that behavior?
Yes. Aside from the obvious solution of just not passing a list there.
You could add your own custom validator, configure it with pre=True
and have it for example only allow actual straight up dict
instances to proceed to further validation. Then you could catch this error immediately.
Hope this helps.
Thanks for shining a light on this because this would have thrown me off at first, too. I think I’ll start digging through the Pydantic issue tracker and PRs and see if this may/should/will be addressed somehow.
PS
Here is very simple implementation of the aforementioned "strict" validator that prevents dict
-coercion and instead raises an error for non-dict
immediately:
from typing import Any
from pydantic.class_validators import validator
from pydantic.dataclasses import dataclass
from pydantic.fields import ModelField, SHAPE_DICT, SHAPE_SINGLETON
@dataclass
class X:
y: dict
@validator("*", pre=True)
def strict_dict(cls, v: Any, field: ModelField) -> Any:
declared_dict_type = (
field.type_ is dict and field.shape is SHAPE_SINGLETON
or field.shape is SHAPE_DICT
)
if declared_dict_type and not isinstance(v, dict):
raise TypeError(f"value must be a `dict`, got {type(v)}")
return v
if __name__ == '__main__':
print(X([{'a': 'b', 'c': 'd'}]))
Output:
Traceback (most recent call last):
File "....py", line 24, in <module>
print(X([{'a': 'b', 'c': 'd'}]))
File "pydantic/dataclasses.py", line 313, in pydantic.dataclasses._add_pydantic_validation_attributes.new_init
File "pydantic/dataclasses.py", line 416, in pydantic.dataclasses._dataclass_validate_values
# worries about external callers.
pydantic.error_wrappers.ValidationError: 1 validation error for X
y
value must be a `dict`, got <class 'list'> (type=type_error)