Dataclass-style object with mutable and immutable properties?
Question:
I have been playing around with dataclasses dynamically loaded with property names from a file and I am unable to find a way to create both ‘frozen’ and ‘non-frozen’ properties. I believe dataclasses only allow you to set all properites to frozen or non-frozen.
As of now, I create a frozen dataclass and add a mutable class as one of the properties which I can change as I go but I am not very happy with the readability of this approach.
Is there another pythonic dataclass people would recommend without needing to implement a class with the ability to set mutable/immutable properties?
import dataclasses
class ModifiableConfig:
"""There is stuff in here but you get the picture."""
...
config_dataclass = dataclasses.make_dataclass(
'c',
[(x, type(x), v) for x, v in config.items()] + [('var', object, ModifiableConfig())],
frozen=True
)
However I would prefer the ability to choose which attributes are frozen and which are not. Making the need of adding an additional class to the dataclass obsolete. It may look like this:
config_dataclass_modifiable = dataclasses.make_dataclass(
'c', [(x, type(x), v, True if 'modifiable' in x else False) for x, v in config.items()])
Notice the “True if ‘modifiable’ in x else False”, I’m not saying this is how I would do it in the end but hopefully this helps understand my question better.
Answers:
The normal approach to tuning attribute handling is writing a custom __setattr__
method which allows you to override the default behavior for attribute assignments. Unfortunately, that method is also what dataclasses hooks into to enforce the frozen
logic, which effectively locks the function from being altered any further by throwing TypeError: Cannot overwrite attribute __setattr__ in class ModifiableConfig
as soon as you try to touch it.
As a consequence, there is no straight forward and simple solution to your problem that I can see. Your approach of delegating the mutable parts of a class to an inner object or dictionary is, in my opinion, not bad or un-pythonic at all, but if you’re fine with dropping frozen
from your requirements list and only want a partly-mutable dataclass, you can try using this bootleg-semi-frozen recipe here that updates the dataclass
decorator with a flag semi
that you can switch on to get the behavior you described:
from dataclasses import dataclass as dc
from traceback import format_stack
def dataclass(_cls=None, *, init=True, repr=True, eq=True, order=False,
unsafe_hash=False, frozen=False, semi=False):
def wrap(cls):
# sanity checks for new kw
if semi:
if frozen:
raise AttributeError("Either semi or frozen, not both.")
if cls.__setattr__ != cls.mro()[1].__setattr__:
raise AttributeError("No touching setattr when using semi!")
# run original dataclass decorator
dc(cls, init=init, repr=repr, eq=eq, order=order,
unsafe_hash=unsafe_hash, frozen=frozen)
# add semi-frozen logic
if semi:
def __setattr__(self, key, value):
if key in self.__slots__:
caller = format_stack()[-2].rsplit('in ', 1)[1].strip()
if caller != '__init__':
raise TypeError(f"Attribute '{key}' is immutable!")
object.__setattr__(self, key, value)
cls.__setattr__ = __setattr__
return cls
# Handle being called with or without parens
if _cls is None:
return wrap
return wrap(_cls)
I’m being brief here and don’t address some potential edge-cases here. There are better ways to handle the wrapping so that the internals are more consistent, but it would blow this already complicated snippet up even more.
Given this new dataclass
decorator, you can use it like this to define a dataclass with some immutable attributes and some mutable ones:
>>> @dataclass(semi=True)
... class Foo:
... # put immutable attributes and __dict__ into slots
... __slots__ = ('__dict__', 'x', 'y')
... x: int
... y: int
... z: int
...
>>> f = Foo(1, 2, 3)
>>> f # prints Foo(x=1, y=2, z=3)
>>> f.z = 4 # will work
>>> f.x = 4 # raises TypeError: attribute 'x' is immutable!
You don’t have to use __slots__
to separate the mutable from the immutable part, but it is convenient for a few reasons (such as being a meta-attribute that isn’t part of the default dataclass repr
) and felt intuitive to me.
In the top answer above, the code breaks if Foo
is a subclass of another class. To fix this, the line:
super(type(self), self).__setattr__(key, value)
should read:
super(type(cls), cls).__setattr__(key, value)
That way, super actually traverses upward instead of going into an infinite self reference.
I found quite a simple way of doing this and keep some kind of decent code:
@dataclass
class Person():
name: str
id: int
_id: int = field(init=False, repr=False)
@property
def id(self):
return self._id
@id.setter
def id(self, id: int) -> None:
try:
if self._id:
raise Exception('This field is inmutable!')
except AttributeError as error:
self._id = id
Basically id
becomes an interface and I overwrite the setter by throwing an exception when the _id
already exists. You can always create a dedicated exception class for this purpose. Something like InmutableException
.
Since dataclasses
adds new arguments to @dataclass(...)
in newer Python versions, such as kw_only
in Python 3.10, using a decorator to wrap the @dataclass
decorator might not be an ideal option moving forward.
One alternative is to use a newer descriptor approach in Python 3. While the below solution does not work when slots=True
is passed in to the @dataclass
decorator, it does appear to work well enough in the general case.
Here is an implementation of a simple descriptor class Frozen
, which raises an error if an attribute is set more than once – i.e. outside of __init__()
:
class Frozen:
__slots__ = ('private_name', )
def __set_name__(self, owner, name):
self.private_name = '_' + name
def __get__(self, obj, objtype=None):
value = getattr(obj, self.private_name)
return value
def __set__(self, obj, value):
if hasattr(obj, self.private_name):
msg = f'Attribute `{self.private_name[1:]}` is immutable!'
raise TypeError(msg) from None
setattr(obj, self.private_name, value)
Usage:
from dataclasses import dataclass
@dataclass
class Foo:
# optional: define __slots__ to reduce memory usage
__slots__ = ('_x', '_y', 'z')
x: int = Frozen()
y: int = Frozen()
z: int
f = Foo(1, 2, 3)
print(f)
f.z = 4 # will work
f.z = 5 # will work
f.x = 4 # raises an error -> TypeError: Attribute `x` is immutable!
For Frozen
which allows you to set a default
value for a field, see my post here which indicates how to set it up.
Timings
If curious, I have also timed the descriptor approach above with the custom __setattr__()
approach as outlined in the top answer.
Here is my sample code with the timeit
module:
from timeit import timeit
@dc
class Foo:
# uncomment if you truly want to add __slots__:
# __slots__ = ('_x', '_y', 'z')
x: int = Frozen()
y: int = Frozen()
z: int
@dataclass(semi=True)
class Foo2:
# put immutable attributes and __dict__ into slots
__slots__ = ('__dict__', 'x', 'y')
x: int
y: int
z: int
n = 100_000
print('Foo.__init__() -> descriptor: ', timeit('Foo(1, 2, 3)', number=n, globals=globals()))
print('Foo.__init__() -> setattr: ', timeit('Foo2(1, 2, 3)', number=n, globals=globals()))
f1 = Foo(1, 2, 3)
f2 = Foo2(1, 2, 3)
print('foo.z -> descriptor: ', timeit('f1.z', number=n, globals=globals()))
print('foo.z -> setattr: ', timeit('f2.z', number=n, globals=globals()))
Results, on my Mac M1:
Foo.__init__() -> descriptor: 0.0345854579936713
Foo.__init__() -> setattr: 3.2137108749884646
foo.z -> descriptor: 0.003795791999436915
foo.z -> setattr: 0.002478832990163937
This indicates creating a new Foo
instance is much faster with a descriptor approach (up to 100x), but calling __setattr__()
is slightly faster with a custom setattr
approach, presumably because implementing a __slots__
attribute reduces memory overhead, and also reduces the average lookup time for instance attributes.
I have been playing around with dataclasses dynamically loaded with property names from a file and I am unable to find a way to create both ‘frozen’ and ‘non-frozen’ properties. I believe dataclasses only allow you to set all properites to frozen or non-frozen.
As of now, I create a frozen dataclass and add a mutable class as one of the properties which I can change as I go but I am not very happy with the readability of this approach.
Is there another pythonic dataclass people would recommend without needing to implement a class with the ability to set mutable/immutable properties?
import dataclasses
class ModifiableConfig:
"""There is stuff in here but you get the picture."""
...
config_dataclass = dataclasses.make_dataclass(
'c',
[(x, type(x), v) for x, v in config.items()] + [('var', object, ModifiableConfig())],
frozen=True
)
However I would prefer the ability to choose which attributes are frozen and which are not. Making the need of adding an additional class to the dataclass obsolete. It may look like this:
config_dataclass_modifiable = dataclasses.make_dataclass(
'c', [(x, type(x), v, True if 'modifiable' in x else False) for x, v in config.items()])
Notice the “True if ‘modifiable’ in x else False”, I’m not saying this is how I would do it in the end but hopefully this helps understand my question better.
The normal approach to tuning attribute handling is writing a custom __setattr__
method which allows you to override the default behavior for attribute assignments. Unfortunately, that method is also what dataclasses hooks into to enforce the frozen
logic, which effectively locks the function from being altered any further by throwing TypeError: Cannot overwrite attribute __setattr__ in class ModifiableConfig
as soon as you try to touch it.
As a consequence, there is no straight forward and simple solution to your problem that I can see. Your approach of delegating the mutable parts of a class to an inner object or dictionary is, in my opinion, not bad or un-pythonic at all, but if you’re fine with dropping frozen
from your requirements list and only want a partly-mutable dataclass, you can try using this bootleg-semi-frozen recipe here that updates the dataclass
decorator with a flag semi
that you can switch on to get the behavior you described:
from dataclasses import dataclass as dc
from traceback import format_stack
def dataclass(_cls=None, *, init=True, repr=True, eq=True, order=False,
unsafe_hash=False, frozen=False, semi=False):
def wrap(cls):
# sanity checks for new kw
if semi:
if frozen:
raise AttributeError("Either semi or frozen, not both.")
if cls.__setattr__ != cls.mro()[1].__setattr__:
raise AttributeError("No touching setattr when using semi!")
# run original dataclass decorator
dc(cls, init=init, repr=repr, eq=eq, order=order,
unsafe_hash=unsafe_hash, frozen=frozen)
# add semi-frozen logic
if semi:
def __setattr__(self, key, value):
if key in self.__slots__:
caller = format_stack()[-2].rsplit('in ', 1)[1].strip()
if caller != '__init__':
raise TypeError(f"Attribute '{key}' is immutable!")
object.__setattr__(self, key, value)
cls.__setattr__ = __setattr__
return cls
# Handle being called with or without parens
if _cls is None:
return wrap
return wrap(_cls)
I’m being brief here and don’t address some potential edge-cases here. There are better ways to handle the wrapping so that the internals are more consistent, but it would blow this already complicated snippet up even more.
Given this new dataclass
decorator, you can use it like this to define a dataclass with some immutable attributes and some mutable ones:
>>> @dataclass(semi=True)
... class Foo:
... # put immutable attributes and __dict__ into slots
... __slots__ = ('__dict__', 'x', 'y')
... x: int
... y: int
... z: int
...
>>> f = Foo(1, 2, 3)
>>> f # prints Foo(x=1, y=2, z=3)
>>> f.z = 4 # will work
>>> f.x = 4 # raises TypeError: attribute 'x' is immutable!
You don’t have to use __slots__
to separate the mutable from the immutable part, but it is convenient for a few reasons (such as being a meta-attribute that isn’t part of the default dataclass repr
) and felt intuitive to me.
In the top answer above, the code breaks if Foo
is a subclass of another class. To fix this, the line:
super(type(self), self).__setattr__(key, value)
should read:
super(type(cls), cls).__setattr__(key, value)
That way, super actually traverses upward instead of going into an infinite self reference.
I found quite a simple way of doing this and keep some kind of decent code:
@dataclass
class Person():
name: str
id: int
_id: int = field(init=False, repr=False)
@property
def id(self):
return self._id
@id.setter
def id(self, id: int) -> None:
try:
if self._id:
raise Exception('This field is inmutable!')
except AttributeError as error:
self._id = id
Basically id
becomes an interface and I overwrite the setter by throwing an exception when the _id
already exists. You can always create a dedicated exception class for this purpose. Something like InmutableException
.
Since dataclasses
adds new arguments to @dataclass(...)
in newer Python versions, such as kw_only
in Python 3.10, using a decorator to wrap the @dataclass
decorator might not be an ideal option moving forward.
One alternative is to use a newer descriptor approach in Python 3. While the below solution does not work when slots=True
is passed in to the @dataclass
decorator, it does appear to work well enough in the general case.
Here is an implementation of a simple descriptor class Frozen
, which raises an error if an attribute is set more than once – i.e. outside of __init__()
:
class Frozen:
__slots__ = ('private_name', )
def __set_name__(self, owner, name):
self.private_name = '_' + name
def __get__(self, obj, objtype=None):
value = getattr(obj, self.private_name)
return value
def __set__(self, obj, value):
if hasattr(obj, self.private_name):
msg = f'Attribute `{self.private_name[1:]}` is immutable!'
raise TypeError(msg) from None
setattr(obj, self.private_name, value)
Usage:
from dataclasses import dataclass
@dataclass
class Foo:
# optional: define __slots__ to reduce memory usage
__slots__ = ('_x', '_y', 'z')
x: int = Frozen()
y: int = Frozen()
z: int
f = Foo(1, 2, 3)
print(f)
f.z = 4 # will work
f.z = 5 # will work
f.x = 4 # raises an error -> TypeError: Attribute `x` is immutable!
For Frozen
which allows you to set a default
value for a field, see my post here which indicates how to set it up.
Timings
If curious, I have also timed the descriptor approach above with the custom __setattr__()
approach as outlined in the top answer.
Here is my sample code with the timeit
module:
from timeit import timeit
@dc
class Foo:
# uncomment if you truly want to add __slots__:
# __slots__ = ('_x', '_y', 'z')
x: int = Frozen()
y: int = Frozen()
z: int
@dataclass(semi=True)
class Foo2:
# put immutable attributes and __dict__ into slots
__slots__ = ('__dict__', 'x', 'y')
x: int
y: int
z: int
n = 100_000
print('Foo.__init__() -> descriptor: ', timeit('Foo(1, 2, 3)', number=n, globals=globals()))
print('Foo.__init__() -> setattr: ', timeit('Foo2(1, 2, 3)', number=n, globals=globals()))
f1 = Foo(1, 2, 3)
f2 = Foo2(1, 2, 3)
print('foo.z -> descriptor: ', timeit('f1.z', number=n, globals=globals()))
print('foo.z -> setattr: ', timeit('f2.z', number=n, globals=globals()))
Results, on my Mac M1:
Foo.__init__() -> descriptor: 0.0345854579936713
Foo.__init__() -> setattr: 3.2137108749884646
foo.z -> descriptor: 0.003795791999436915
foo.z -> setattr: 0.002478832990163937
This indicates creating a new Foo
instance is much faster with a descriptor approach (up to 100x), but calling __setattr__()
is slightly faster with a custom setattr
approach, presumably because implementing a __slots__
attribute reduces memory overhead, and also reduces the average lookup time for instance attributes.