Why does the pydantic dataclass cast a list to a dict? How to prevent this behavior?

Question:

I am a bit confused by the behavior of the pydantic dataclass.
Why does the dict type accept a list of a dict as valid dict and why is it converted it to a dict of the keys?
Am i doing something wrong? Is this some kind of intended behavior and if so is there a way to prevent that behavior?

Code Example:

from pydantic.dataclasses import dataclass

@dataclass
class X:
    y: dict
print(X([{'a':'b', 'c':'d'}]))

Output:

X(y={'a': 'c'})
Asked By: PirateWorm

||

Answers:

Hah, this is kind of amusing actually. Bear with me…


Why does the dict type accept a list of a dict as valid dict and why is it converted it to a dict of the keys?

This can be explained when we take a closer look at the default dict_validator used by Pydantic models. The very first thing it does (to any non-dict) is attempt to coerce the value to a dict.

Try this with your specific example:

y = [{'a': 'b', 'c': 'd'}]
assert dict(y) == {'a': 'c'}

Why is that?

Well, to initialize a dict, you can pass different kinds of arguments. One option is to pass some Iterable and

Each item in the iterable must itself be an iterable with exactly two objects. The first object of each item becomes a key in the new dictionary, and the second object the corresponding value.

In your example, you just so happened to have an iterable (specifically a list) and the only item in that iterable is itself an iterable (specifically a dict). And how are dictionaries iterated over by default? Via their keys! Since that dictionary {'a': 'b', 'c': 'd'} has exactly two key-value-pairs, that means when iterated over it produces those two keys, i.e. "a" and "c":

d = {'a': 'b', 'c': 'd'}
assert tuple(iter(d)) == ('a', 'c')

It is this mechanism that allows a dict to be constructed for example from a list of 2-tuples like so:

data = [('a', 1), ('b', 2)]
assert dict(data) == {'a': 1, 'b': 2}

In your case this leads to the result you showed, which at first glance seems strange and unexpected, but actually makes sense, when you think about the logic of dictionary initialization.

What is funny is that this only works when the dict in the list has exactly two key-value pairs! Anything more or less will lead to an error. (Try it yourself.)

So in short: This behavior is neither special to Pydantic nor dataclasses, but is the result of regular dict initialization.


Am i doing something wrong?

I would say, yes. The value you are trying to assign to X.y is a list, but you declared it to be a dict. So that is obviously wrong. I get that sometimes data comes from external sources, so it may not be up to you.


Is this some kind of intended behavior […]?

That is a good question in the sense that I am curious to know if the Pydantic team is aware of this edge case and the strange result it causes. I would say it is at least understandable that the dictionary validator was implemented the way it was.


is there a way to prevent that behavior?

Yes. Aside from the obvious solution of just not passing a list there.

You could add your own custom validator, configure it with pre=True and have it for example only allow actual straight up dict instances to proceed to further validation. Then you could catch this error immediately.


Hope this helps.

Thanks for shining a light on this because this would have thrown me off at first, too. I think I’ll start digging through the Pydantic issue tracker and PRs and see if this may/should/will be addressed somehow.

PS

Here is very simple implementation of the aforementioned "strict" validator that prevents dict-coercion and instead raises an error for non-dict immediately:

from typing import Any

from pydantic.class_validators import validator
from pydantic.dataclasses import dataclass
from pydantic.fields import ModelField, SHAPE_DICT, SHAPE_SINGLETON


@dataclass
class X:
    y: dict

    @validator("*", pre=True)
    def strict_dict(cls, v: Any, field: ModelField) -> Any:
        declared_dict_type = (
            field.type_ is dict and field.shape is SHAPE_SINGLETON
            or field.shape is SHAPE_DICT
        )
        if declared_dict_type and not isinstance(v, dict):
            raise TypeError(f"value must be a `dict`, got {type(v)}")
        return v


if __name__ == '__main__':
    print(X([{'a': 'b', 'c': 'd'}]))

Output:

Traceback (most recent call last):
  File "....py", line 24, in <module>
    print(X([{'a': 'b', 'c': 'd'}]))
  File "pydantic/dataclasses.py", line 313, in pydantic.dataclasses._add_pydantic_validation_attributes.new_init
    
  File "pydantic/dataclasses.py", line 416, in pydantic.dataclasses._dataclass_validate_values
    # worries about external callers.
pydantic.error_wrappers.ValidationError: 1 validation error for X
y
  value must be a `dict`, got <class 'list'> (type=type_error)
Answered By: Daniil Fajnberg
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.