Why is Python dataclass a decorator and not a base class?

Question:

Why does Python implement dataclasses.dataclass as a class decorator and not as a base class? I think it would be at least clearer from the conceptual point of view to have it as a base class: the __init__ method seems to be the only thing a dataclass decorator adds to a class, and adding methods and attributes is what any straightforward base class in usually intended to do. Why to implement a decorator that intrinsically modifies a class? Base classes are just meant for this.
Also, having a "Dataclass" base class would make easier for users to modify its behaviour in case any particular working mechanism is needed, one would just have to overwrite base class’ methods when inheriting the dataclass.

Since it clearly has been made this way for some reason I’m trying to figure out why. The only thing that comes to my mind might be some performance-related thing, I think inheriting a class should be slower that just passing a class through a function, however I’m not sure dataclasses are meant to be highly performant – nor the Python language itself – and in any case for that we have named tuples.

Asked By: Giuppox

||

Answers:

Dataclasses were introduced in PEP 557, which describes some of the design considerations for this feature, including rejected ideas. However, there is no mention of any rejected alternatives to using a decorator, such as using a base class instead. So it seems we cannot give a definitive answer for why a decorator was used rather than a base class. Perhaps using a base class simply wasn’t considered as an option.

That said, it is not correct to say that the @dataclass decorator only adds the __init__ method. The signature of the dataclass function shows that there are several other features provided, which you can opt in or out of by passing boolean flags:

  • __eq__, enabled by default
  • __repr__, enabled by default
  • Ordering methods such as __lt__, opt-in
  • __hash__, opt-in
  • Immutability through the __setattr__ and __delattr__ methods, opt-in

In principle, these methods could be implemented in a generic way in a base class. However, there is a potential for greater performance by generating code for these methods once the field specifications are known; this way, you don’t have to look at those field specifications on every method call. (These methods are generated when the dataclass is declared, by building Python code as a string and calling exec, see here.) So, one big advantage of using a decorator is that you can look at the field specifications just once and then generate code for that specific dataclass.

By analogy, implementing these in a generic way in a base class is like interpreting, whereas generating code with a decorator is like compiling. (Of course the resulting generated code is still interpreted by the CPython interpreter, which is why this is only an analogy.)

Or, putting it a different way, it makes less sense to inherit methods such as __eq__, __lt__ and __hash__ from a base class, because these methods don’t do the same thing for every dataclass, except in a much more generic sense.

Answered By: kaya3
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.