Dataclass-style object with mutable and immutable properties?

Question:

I have been playing around with dataclasses dynamically loaded with property names from a file and I am unable to find a way to create both ‘frozen’ and ‘non-frozen’ properties. I believe dataclasses only allow you to set all properites to frozen or non-frozen.

As of now, I create a frozen dataclass and add a mutable class as one of the properties which I can change as I go but I am not very happy with the readability of this approach.

Is there another pythonic dataclass people would recommend without needing to implement a class with the ability to set mutable/immutable properties?

import dataclasses

class ModifiableConfig:
    """There is stuff in here but you get the picture."""
    ...

config_dataclass = dataclasses.make_dataclass(
    'c',
    [(x, type(x), v) for x, v in config.items()] + [('var', object, ModifiableConfig())],
    frozen=True
)

However I would prefer the ability to choose which attributes are frozen and which are not. Making the need of adding an additional class to the dataclass obsolete. It may look like this:

config_dataclass_modifiable = dataclasses.make_dataclass(
            'c', [(x, type(x), v, True if 'modifiable' in x else False) for x, v in config.items()])

Notice the “True if ‘modifiable’ in x else False”, I’m not saying this is how I would do it in the end but hopefully this helps understand my question better.

Asked By: JMB

||

Answers:

The normal approach to tuning attribute handling is writing a custom __setattr__ method which allows you to override the default behavior for attribute assignments. Unfortunately, that method is also what dataclasses hooks into to enforce the frozen logic, which effectively locks the function from being altered any further by throwing TypeError: Cannot overwrite attribute __setattr__ in class ModifiableConfig as soon as you try to touch it.

As a consequence, there is no straight forward and simple solution to your problem that I can see. Your approach of delegating the mutable parts of a class to an inner object or dictionary is, in my opinion, not bad or un-pythonic at all, but if you’re fine with dropping frozen from your requirements list and only want a partly-mutable dataclass, you can try using this bootleg-semi-frozen recipe here that updates the dataclass decorator with a flag semi that you can switch on to get the behavior you described:

from dataclasses import dataclass as dc
from traceback import format_stack

def dataclass(_cls=None, *, init=True, repr=True, eq=True, order=False,
              unsafe_hash=False, frozen=False, semi=False):

    def wrap(cls):
        # sanity checks for new kw
        if semi:
            if frozen:
                raise AttributeError("Either semi or frozen, not both.")
            if cls.__setattr__ != cls.mro()[1].__setattr__:
                raise AttributeError("No touching setattr when using semi!")

        # run original dataclass decorator
        dc(cls, init=init, repr=repr, eq=eq, order=order,
           unsafe_hash=unsafe_hash, frozen=frozen)

        # add semi-frozen logic
        if semi:
            def __setattr__(self, key, value):
                if key in self.__slots__:
                    caller = format_stack()[-2].rsplit('in ', 1)[1].strip()
                    if caller != '__init__':
                        raise TypeError(f"Attribute '{key}' is immutable!")
                object.__setattr__(self, key, value)
            cls.__setattr__ = __setattr__

        return cls

    # Handle being called with or without parens
    if _cls is None:
        return wrap
    return wrap(_cls)

I’m being brief here and don’t address some potential edge-cases here. There are better ways to handle the wrapping so that the internals are more consistent, but it would blow this already complicated snippet up even more.

Given this new dataclass decorator, you can use it like this to define a dataclass with some immutable attributes and some mutable ones:

>>> @dataclass(semi=True)
... class Foo:
...     # put immutable attributes and __dict__ into slots 
...     __slots__ = ('__dict__', 'x', 'y')
...     x: int
...     y: int
...     z: int
...
>>> f = Foo(1, 2, 3)
>>> f        # prints Foo(x=1, y=2, z=3)
>>> f.z = 4  # will work
>>> f.x = 4  # raises TypeError: attribute 'x' is immutable!

You don’t have to use __slots__ to separate the mutable from the immutable part, but it is convenient for a few reasons (such as being a meta-attribute that isn’t part of the default dataclass repr) and felt intuitive to me.

Answered By: Arne

In the top answer above, the code breaks if Foo is a subclass of another class. To fix this, the line:

super(type(self), self).__setattr__(key, value)

should read:

super(type(cls), cls).__setattr__(key, value)

That way, super actually traverses upward instead of going into an infinite self reference.

Answered By: Coert van Gemeren

I found quite a simple way of doing this and keep some kind of decent code:

@dataclass
class Person():
    name: str
    id: int
    _id: int = field(init=False, repr=False)

    @property
    def id(self):
        return self._id

    @id.setter
    def id(self, id: int) -> None:
        try:
            if self._id:
                raise Exception('This field is inmutable!')
        except AttributeError as error:
            self._id = id

Basically id becomes an interface and I overwrite the setter by throwing an exception when the _id already exists. You can always create a dedicated exception class for this purpose. Something like InmutableException.

Answered By: Juan Urrego

Since dataclasses adds new arguments to @dataclass(...) in newer Python versions, such as kw_only in Python 3.10, using a decorator to wrap the @dataclass decorator might not be an ideal option moving forward.

One alternative is to use a newer descriptor approach in Python 3. While the below solution does not work when slots=True is passed in to the @dataclass decorator, it does appear to work well enough in the general case.

Here is an implementation of a simple descriptor class Frozen, which raises an error if an attribute is set more than once – i.e. outside of __init__():

class Frozen:
    __slots__ = ('private_name', )

    def __set_name__(self, owner, name):
        self.private_name = '_' + name

    def __get__(self, obj, objtype=None):
        value = getattr(obj, self.private_name)
        return value

    def __set__(self, obj, value):
        if hasattr(obj, self.private_name):
            msg = f'Attribute `{self.private_name[1:]}` is immutable!'
            raise TypeError(msg) from None

        setattr(obj, self.private_name, value)

Usage:

from dataclasses import dataclass


@dataclass
class Foo:
    # optional: define __slots__ to reduce memory usage
    __slots__ = ('_x', '_y', 'z')

    x: int = Frozen()
    y: int = Frozen()
    z: int


f = Foo(1, 2, 3)
print(f)

f.z = 4  # will work
f.z = 5  # will work

f.x = 4  # raises an error -> TypeError: Attribute `x` is immutable!

For Frozen which allows you to set a default value for a field, see my post here which indicates how to set it up.

Timings

If curious, I have also timed the descriptor approach above with the custom __setattr__() approach as outlined in the top answer.

Here is my sample code with the timeit module:

from timeit import timeit


@dc
class Foo:
    # uncomment if you truly want to add __slots__:
    # __slots__ = ('_x', '_y', 'z')

    x: int = Frozen()
    y: int = Frozen()
    z: int


@dataclass(semi=True)
class Foo2:
    # put immutable attributes and __dict__ into slots
    __slots__ = ('__dict__', 'x', 'y')
    x: int
    y: int
    z: int


n = 100_000

print('Foo.__init__() -> descriptor: ', timeit('Foo(1, 2, 3)', number=n, globals=globals()))
print('Foo.__init__() -> setattr:    ', timeit('Foo2(1, 2, 3)', number=n, globals=globals()))

f1 = Foo(1, 2, 3)
f2 = Foo2(1, 2, 3)

print('foo.z -> descriptor: ', timeit('f1.z', number=n, globals=globals()))
print('foo.z -> setattr:    ', timeit('f2.z', number=n, globals=globals()))

Results, on my Mac M1:

Foo.__init__() -> descriptor:  0.0345854579936713
Foo.__init__() -> setattr:     3.2137108749884646
foo.z -> descriptor:  0.003795791999436915
foo.z -> setattr:     0.002478832990163937

This indicates creating a new Foo instance is much faster with a descriptor approach (up to 100x), but calling __setattr__() is slightly faster with a custom setattr approach, presumably because implementing a __slots__ attribute reduces memory overhead, and also reduces the average lookup time for instance attributes.

Answered By: rv.kvetch