Why are attributes defined outside __init__ in popular packages like SQLAlchemy or Pydantic?

Question:

I’m modifying an app, trying to use Pydantic for my application models and SQLAlchemy for my database models.

I have existing classes, where I defined attributes inside the __init__ method as I was taught to do:

class Measure:
    def __init__(
        self,
        t_received: int,
        mac_address: str,
        data: pd.DataFrame,
        battery_V: float = 0
    ):
        self.t_received = t_received
        self.mac_address = mac_address
        self.data = data
        self.battery_V = battery_V

In both Pydantic and SQLAlchemy, following the docs, I have to define those attributes outside the __init__ method, for example in Pydantic:

import pydantic

class Measure(pydantic.BaseModel):
    t_received: int
    mac_address: str
    data: pd.DataFrame
    battery_V: float

Why is it the case? Isn’t this bad practice? Is there any impact on other methods (classmethods, staticmethods, properties …) of that class?

Note that this is also very unhandy because when I instantiate an object of that class, I don’t get suggestions on what parameters are expected by the constructor!

Asked By: user11696358

||

Answers:

Defining attributes of a class in the class namespace directly is totally acceptable and is not special per se for the packages you mentioned. Since the class namespace is (among other things) essentially a blueprint for instances of that class, defining attributes there can actually be useful, when you want to e.g. provide all public attributes with type annotations in a single place in a consistent manner.

Consider also that a public attribute does not necessarily need to be reflected by a parameter in the constructor of the class. For example, this is entirely reasonable:

class Foo:
    a: list[int]
    b: str

    def __init__(self, b: str) -> None:
        self.a = []
        self.b = b

In other words, just because something is a public attribute, that does not mean it should have to be provided by the user upon initialization. To say nothing of protected/private attributes.

What is special about Pydantic (to take your example), is that the metaclass of BaseModel as well as the class itself does a whole lot of magic with the attributes defined in the class namespace. Pydantic refers to a model’s typical attributes as "fields" and one bit of magic allows special checks to be done during initialization based on those fields you defined in the class namespace. For example, the constructor must receive keyword arguments that correspond to the non-optional fields you defined.

from pydantic import BaseModel


class MyModel(BaseModel):
    field_a: str
    field_b: int = 1


obj = MyModel(
    field_a="spam",  # required
    field_b=2,       # optional
    field_c=3.14,    # unexpected/ignored
)

If I were to omit field_a during construction of a MyModel instance, an error would be raised. Likewise, if I had tried to pass field_b="eggs", an error would be raised.

So the fact that you don’t write your own __init__ method is a feature Pydantic provides you. You only define the fields and an appropriate constructor is "magically" there for you already.

As for the drawback you mentioned, where you don’t get any auto-suggestions, that is true by default for all IDEs. Static type checkers cannot understand that dynamic constructor and simply infer what arguments are expected. Currently this is solved via extensions, such as the mypy plugin and the PyCharm plugin. Maybe soon the @dataclass_transform decorator from PEP 681
will standardize this for similar packages and thus improve support by static type checkers.

It is also worth noting that even the standard library’s dataclasses only work via special extensions in type checkers.

To your other question, there is obviously some impact on methods of such classes (by design), though the specifics are not always obvious. You should of course not simply write your own __init__ method without being careful to call the superclass’ __init__ properly inside it. Also, @property-setters currently don’t work as you would expect it (though it is debatable if it even makes sense to use properties on Pydantic models).

To wrap up, this approach is not only not bad practice, it is a great idea to reduce boilerplate code and it is extremely common these days, as evidenced by the fact that hugely popular and established packages (like the aforementioned Pydantic, as well as e.g. SQLAlchemy, Django and others) use this pattern to a certain extent.

Answered By: Daniil Fajnberg

Pydantic has its own (rewriting) magic, but SQLalchemy is a bit easier to explain.

A SA model looks like this :

>>> from sqlalchemy import Column, Integer, String
>>> class User(Base):
...
...     id = Column(Integer, primary_key=True)
...     name = Column(String)

Column, Integer and String are descriptors. A descriptor is a class that overrides the get and set methods. In practice, this means the class can control how data is accessed and stored.

For example this assignment would now use the __set__ method from Column:

class User(Base):
   id = Column(Integer, primary_key=True)
   name = Column(String)

user = User()
user.name = 'John'  

This is the same as user.name.__set__('John') , however, because of the MRO, it finds a set method in Column, so uses that instead. In a simplified version the Column looks something like this:

class Column:
    def __init__(self, field=""):
        self.field= field
    def __get__(self, obj, type):
        return obj.__dict__.get(self.field)
    def __set__(self, obj, val):
        if validate_field(val)
           obj.__dict__[self.field] = val
        else:
           print('not a valid value')

(This is similar to using @property. A Descriptor is a re-usable @property)

Answered By: Alex