Force type conversion in python dataclass __init__ method

Question:

I have the following very simple dataclass:

import dataclasses

@dataclasses.dataclass
class Test:
    value: int

I create an instance of the class but instead of an integer I use a string:

>>> test = Test('1')
>>> type(test.value)
<class 'str'>

What I actually want is a forced conversion to the datatype i defined in the class defintion:

>>> test = Test('1')
>>> type(test.value)
<class 'int'>

Do I have to write the __init__ method manually or is there a simple way to achieve this?

Asked By: johnson

||

Answers:

The type hint of dataclass attributes is never obeyed in the sense that types are enforced or checked. Mostly static type checkers like mypy are expected to do this job, Python won’t do it at runtime, as it never does.

If you want to add manual type checking code, do so in the __post_init__ method:

@dataclasses.dataclass
class Test:
    value: int

    def __post_init__(self):
        if not isinstance(self.value, int):
            raise ValueError('value not an int')
            # or self.value = int(self.value)

You could use dataclasses.fields(self) to get a tuple of Field objects which specify the field and the type and loop over that to do this for each field automatically, without writing it for each one individually.

def __post_init__(self):
    for field in dataclasses.fields(self):
        value = getattr(self, field.name)
        if not isinstance(value, field.type):
            raise ValueError(f'Expected {field.name} to be {field.type}, '
                             f'got {repr(value)}')
            # or setattr(self, field.name, field.type(value))
Answered By: deceze

You could achieve this using the __post_init__ method:

import dataclasses

@dataclasses.dataclass
class Test:
    value : int

    def __post_init__(self):
        self.value = int(self.value)

This method is called following the __init__ method

https://docs.python.org/3/library/dataclasses.html#post-init-processing

Answered By: Merig

Yeah, the easy answer is to just do the conversion yourself in your own __init__(). I do this because I want my objects frozen=True.

For the type validation, Pydandic claims to do it, but I haven’t tried it yet: https://pydantic-docs.helpmanual.io/

Answered By: Lars P

It’s easy to achieve by using pydantic.validate_arguments

Just use the validate_arguments decorator in your dataclass:

from dataclasses import dataclass
from pydantic import validate_arguments


@validate_arguments
@dataclass
class Test:
    value: int

Then try your demo, the ‘str type’ 1 will convert from str to int

>>> test = Test('1')
>>> type(test.value)
<class 'int'>

If you pass the truly wrong type, it will raise exception

>>> test = Test('apple')
Traceback (most recent call last):
...
pydantic.error_wrappers.ValidationError: 1 validation error for Test
value
  value is not a valid integer (type=type_error.integer)
Answered By: yanlunlu

With Python dataclasses, the alternative is to use the __post_init__ method, as pointed out in other answers:

@dataclasses.dataclass
class Test:
    value: int

    def __post_init__(self):
        self.value = int(self.value)
>>> test = Test("42")
>>> type(test.value)
<class 'int'>

Or you can use the attrs package, which allows you to easily set converters:

@attr.define
class Test:
    value: int = attr.field(converter=int)
>>> test = Test("42")
>>> type(test.value)
<class 'int'>

You can use the cattrs package, that does conversion based on the type annotations in attr classes and dataclasses, if your data comes from a mapping instead:

@dataclasses.dataclass
class Test:
    value: int
>>> test = cattrs.structure({"value": "42"}, Test)
>>> type(test.value)
<class 'int'>

Pydantic will automatically do conversion based on the types of the fields in the model:

class Test(pydantic.BaseModel):
    value: int
>>> test = Test(value="42")
>>> type(test.value)
<class 'int'>
Answered By: ericbn

You could use descriptor-typed field:

class IntConversionDescriptor:

    def __set_name__(self, owner, name):
        self._name = "_" + name

    def __get__(self, instance, owner):
        return getattr(instance, self._name)

    def __set__(self, instance, value):
        setattr(instance, self._name, int(value))


@dataclass
class Test:
    value: IntConversionDescriptor = IntConversionDescriptor()
>>> test = Test(value=1)
>>> type(test.value)
<class 'int'>

>>> test = Test(value="12")
>>> type(test.value)
<class 'int'>

test.value = "145"
>>> type(test.value)
<class 'int'>

test.value = 45.12
>>> type(test.value)
<class 'int'>
Answered By: Marcin KÄ…dziela

You could use a generic type-conversion descriptor, declared in descriptors.py:

import sys


class TypeConv:

    __slots__ = (
        '_name',
        '_default_factory',
    )

    def __init__(self, default_factory=None):
        self._default_factory = default_factory

    def __set_name__(self, owner, name):
        self._name = "_" + name
        if self._default_factory is None:
            # determine default factory from the type annotation
            tp = owner.__annotations__[name]
            if isinstance(tp, str):
                # evaluate the forward reference
                base_globals = getattr(sys.modules.get(owner.__module__, None), '__dict__', {})
                idx_pipe = tp.find('|')
                if idx_pipe != -1:
                    tp = tp[:idx_pipe].rstrip()
                tp = eval(tp, base_globals)
            # use `__args__` to handle `Union` types
            self._default_factory = getattr(tp, '__args__', [tp])[0]

    def __get__(self, instance, owner):
        return getattr(instance, self._name)

    def __set__(self, instance, value):
        setattr(instance, self._name, self._default_factory(value))

Usage in main.py would be like:

from __future__ import annotations
from dataclasses import dataclass
from descriptors import TypeConv


@dataclass
class Test:
    value: int | str = TypeConv()


test = Test(value=1)
print(test)

test = Test(value='12')
print(test)

# watch out: the following assignment raises a `ValueError`
try:
    test.value = '3.21'
except ValueError as e:
    print(e)

Output:

Test(value=1)
Test(value=12)
invalid literal for int() with base 10: '3.21'

Note that while this does work for other simple types, it does not handle conversions for certain types – such as bool or datetime – as normally expected.

If you are OK with using third-party libraries for this, I have come up with a (de)serialization library called the dataclass-wizard that can perform type conversion as needed, but only when fromdict() is called:

from __future__ import annotations
from dataclasses import dataclass

from dataclass_wizard import JSONWizard


@dataclass
class Test(JSONWizard):
    value: int
    is_active: bool


test = Test.from_dict({'value': '123', 'is_active': 'no'})
print(repr(test))

assert test.value == 123
assert not test.is_active

test = Test.from_dict({'is_active': 'tRuE', 'value': '3.21'})
print(repr(test))

assert test.value == 3
assert test.is_active
Answered By: rv.kvetch

Why not use setattr?

from dataclasses import dataclass, fields

@dataclass()
class Test:
    value: int

    def __post_init__(self):
        for field in fields(self):
            setattr(self, field.name, field.type(getattr(self, field.name)))

Which yields the required result:

>>> test = Test('1')
>>> type(test.value)
<class 'int'>
Answered By: martihj
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.