Python: how to type hint a dataclass?

Question:

The code below works, but I’m getting the following warning by PyCharm:

Cannot find reference __annotations__ in ‘(…) -> Any’.

I guess it’s because I’m using Callable. I didn’t find something like Dataclass. Which type should I use instead?

from __future__ import annotations
from dataclasses import dataclass
from typing import Callable

@dataclass
class Fruit:
  color: str
  taste: str

def get_cls() -> Callable:
  return Fruit

attrs = get_cls().__annotations__  # <- IDE warning
print(attrs)
Asked By: Mr. B.

||

Answers:

In this particular example you can just hint it directly:

from dataclasses import dataclass

@dataclass
class Fruit:
    x: str

def get_cls() -> type[Fruit]:
    return Fruit

attrs = get_cls().__annotations__
print(attrs)
$ python d.py
{'x': <class 'str'>}
$ mypy d.py
Success: no issues found in 1 source file

However I don’t know if this is what you’re asking. Are you after a generic type for any dataclass? (I would be tempted just to hint the union of all possible return types of get_cls(): the whole point about using a dataclass rather than e.g. a dict is surely to distinguish between types of data. And you do want your typechecker to warn you if you try to access attributes not defined on one of your dataclasses.)

References

See the docs on typing.Type which is now available as type (just like we can now use list and dict rather than typing.List and typing.Dict).

Answered By: 2e0byo

The simplest option is to remove the return type annotation in its entirety.

Note: PyCharm is usually smart enough to infer the return type automatically.

from __future__ import annotations
from dataclasses import dataclass
# remove this line
# from typing import Callable

@dataclass
class Fruit:
  color: str
  taste: str

# def get_cls() -> Callable:  <== No, the return annotation is wrong (Fruit is *more* than a callable)
def get_cls():
  return Fruit


attrs = get_cls().__annotations__  # <- No IDE warning, Yay!
print(attrs)

In PyCharm, the return type is correctly inferred:

enter image description here


To generically type hint a dataclass – since dataclasses are essentially Python classes under the hood, with auto-generated methods and some "extra" class attributes added in to the mix, you could just type hint it with typing.Protocol as shown below:

from __future__ import annotations
from dataclasses import dataclass, Field
from typing import TYPE_CHECKING, Any, Callable, Iterable, Protocol


if TYPE_CHECKING:
    # this won't print
    print('Oh YEAH !!')

    class DataClass(Protocol):
        __dict__: dict[str, Any]
        __doc__: str | None
        # if using `@dataclass(slots=True)`
        __slots__: str | Iterable[str]
        __annotations__: dict[str, str | type]
        __dataclass_fields__: dict[str, Field]
        # the actual class definition is marked as private, and here I define
        # it as a forward reference, as I don't want to encourage
        # importing private or "unexported" members.
        __dataclass_params__: '_DataclassParams'
        __post_init__: Callable | None


@dataclass
class Fruit:
    color: str
    taste: str


# noinspection PyTypeChecker
def get_cls() -> type[DataClass]:
    return Fruit


attrs = get_cls().__annotations__  # <- No IDE warning, Yay!

Costs to class def

To address the comments, there does appear to be a non-negligible runtime cost associated to class definitions – hence why I wrap the def with an if block above.

The following code compares the performance with both approaches, to confirm this suspicion:

from __future__ import annotations
from dataclasses import dataclass, Field
from timeit import timeit
from typing import TYPE_CHECKING, Any, Callable, Iterable, Protocol


n = 100_000

print('class def:  ', timeit("""
class DataClass(Protocol):
    __dict__: dict[str, Any]
    __doc__: str | None
    __slots__: str | Iterable[str]
    __annotations__: dict[str, str | type]
    __dataclass_fields__: dict[str, Field]
    __dataclass_params__: '_DataclassParams'
    __post_init__: Callable | None
""", globals=globals(), number=n))

print('if <bool>:  ', timeit("""
if TYPE_CHECKING:
    class DataClass(Protocol):
        __dict__: dict[str, Any]
        __doc__: str | None
        __slots__: str | Iterable[str]
        __annotations__: dict[str, str | type]
        __dataclass_fields__: dict[str, Field]
        __dataclass_params__: '_DataclassParams'
        __post_init__: Callable | None
""", globals=globals(), number=n))

Results, on Mac M1 running Python 3.10:

class def:   0.7453760829521343
if <bool>:   0.0009954579873010516

Hence, it appears to be much faster overall to wrap a class definition (when used purely for type hinting purposes) with an if block as above.

Answered By: rv.kvetch

While the provided solutions do work, I just want to add a bit of context.

IMHO your annotation is not wrong. It is just not strict enough and not all that useful.

Fruit is a class. And technically speaking a class is a callable because type (the class of all classes) implements the __call__ method. In fact, that method is executed every time you create an instance of a class; even before the class’ __init__ method. (For details refer to the "Callable types" subsection in this section of the data model docs.)

One problem with your annotation however, is that Callable is a generic type. Thus, you should specify its type arguments. In this case you would have a few options, depending on how narrow you want your annotation to be. The simplest one that would still be correct here is the "catch-all" callable:

def get_cls() -> Callable[..., Any]:
    return Fruit

But since you know that calling the class Fruit returns an instance of that class, you might as well write this:

def get_cls() -> Callable[..., Fruit]:
    return Fruit

Finally, if you know which arguments will be allowed for instantiating a Fruit (namely the color and taste attributes you defined on the dataclass), you could narrow it down even further:

def get_cls() -> Callable[[str, str], Fruit]:
    return Fruit

Technically, all of those are correct. (Try it with mypy --strict.)

However, even that last annotation is not particularly useful since Fruit is not just any Callable returning a Fruit instance, it is the class Fruit itself. Therefore the most sensible annotation is (as @2e0byo pointed out) this one:

def get_cls() -> type[Fruit]:
    return Fruit

That is what I would do as well.


I disagree with @rv.kvetch that removing the annotation is a solution (in any situation).

His DataClass protocol is an interesting proposal. However I would advise against it in this case for a few reasons:

  1. It might give you all the magic attributes that make up any dataclass, but annotating with it makes you lose all information about the actualy specific class you return from get_cls, namely Fruit. In practical terms this means no auto-suggestions by the IDE of Fruit-specific attributes/methods.
  2. You still have to place a type checker exception/ignore in get_cls because in the eyes of any static type checker type[Fruit] is not a subtype of type[DataClass]. The built-in dataclass protocol is a hack that is carried by specially tailored plugins for mypy, PyCharm etc. and those do not cover this kind of structural subtyping.
  3. Even the forward reference to _DataclassParams is still a problem because it will never be resolved, unless you (surprise, surprise) import that protected member from the depths of the dataclasses package. Thus, this is not a stable annotation.

So from a type safety standpoint, there are two big errors in that code — the subtyping and the unresolved reference — and two minor errors; those being the non-parameterized generic annotations for __dataclass_fields__ (Field is generic) and __post_init__ (Callable is generic).

Still, I like protocols. Python is a protocol-oriented language. The approach is interesting.

Answered By: Daniil Fajnberg