How can dataclasses be made to work better with __slots__?

Question:

It was decided to remove direct support for __slots__ from dataclasses for Python 3.7.

Despite this, __slots__ can still be used with dataclasses:

from dataclasses import dataclass

@dataclass
class C():
    __slots__ = "x"
    x: int

However, because of the way __slots__ works it isn’t possible to assign a default value to a dataclass field:

from dataclasses import dataclass

@dataclass
class C():
    __slots__ = "x"
    x: int = 1

This results in an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable

How can __slots__ and default dataclass fields be made to work together?

Asked By: Rick

||

Answers:

2021 UPDATE: direct support for __slots__ is added to python 3.10. I am leaving this answer for posterity and won’t be updating it.

The problem is not unique to dataclasses. ANY conflicting class attribute will stomp all over a slot:

>>> class Failure:
...     __slots__ = tuple("xyz")
...     x=1
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'x' in __slots__ conflicts with class variable

This is simply how slots work. The error happens because __slots__ creates a class-level descriptor object for each slot name:

>>> class Success:
...     __slots__ = tuple("xyz")
...
>>>
>>> type(Success.x)
<class 'member_descriptor'>

In order to prevent this conflicting variable name error, the class namespace must be altered before the class object is instantiated such that there are not two objects competing for the same member name in the class:

  • the specified (default) value*
  • the slot descriptor (created by the slots machinery)

For this reason, an __init_subclass__ method on a parent class will not be sufficient, nor will a class decorator, because in both cases the class object has already been created by the time these functions have received the class to alter it.

Current option: write a metaclass

Until such time as the slots machinery is altered to allow more flexibility, or the language itself provides an opportunity to alter the class namespace before the class object is instantiated, our only choice is to use a metaclass.

Any metaclass written to solve this problem must, at minimum:

  • remove the conflicting class attributes/members from the namespace
  • instantiate the class object to create the slot descriptors
  • save references to the slot descriptors
  • put the previously removed members and their values back in the class __dict__ (so the dataclass machinery can find them)
  • pass the class object to the dataclass decorator
  • restore the slots descriptors to their respective places
  • also take into account plenty of corner cases (such as what to do if there is a __dict__ slot)

To say the least, this is an extremely complicated endeavor. It would be easier to define the class like the following- without a default value so that the conflict doesn’t occur at all- and then add a default value afterward.

Current option: make alterations after class object instantiation

The unaltered dataclass would look like this:

@dataclass
class C:
    __slots__ = "x"
    x: int

The alteration is straightforward. Change the __init__ signature to reflect the desired default value, and then change the __dataclass_fields__ to reflect the presence of a default value.

from functools import wraps

def change_init_signature(init):
    @wraps(init)
    def __init__(self, x=1):
        init(self,x)
    return __init__

C.__init__ = change_init_signature(C.__init__)

C.__dataclass_fields__["x"].default = 1

Test:

>>> C()
C(x=1)
>>> C(2)
C(x=2)
>>> C.x
<member 'x' of 'C' objects>
>>> vars(C())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: vars() argument must have __dict__ attribute

It works!

Current option: a setmember decorator

With some effort, a so-called setmember decorator could be employed to automatically alter the class in the manner above. This would require deviating from the dataclasses API in order to define the default value in a location other than inside the class body, perhaps something like:

@setmember(x=field(default=1))
@dataclass
class C:
    __slots__="x"
    x: int

The same thing could also be accomplished through a __init_subclass__ method on a parent class:

class SlottedDataclass:
    def __init_subclass__(cls, **kwargs):
        cls.__init_subclass__()
        # make the class changes here

class C(SlottedDataclass, x=field(default=1)):
    __slots__ = "x"
    x: int

Future possibility: change the slots machinery

Another possibility, as mentioned above, would be for the python language to alter the slots machinery to allow more flexibility. One way of doing this might be to change the slots descriptor itself to store class level data at the time of class definition.

This could be done, perhaps, by supplying a dict as the __slots__ argument (see below). The class-level data (1 for x, 2 for y) could just be stored on the descriptor itself for retrieval later:

class C:
    __slots__ = {"x": 1, "y": 2}

assert C.x.value == 1
assert C.y.value == y

One difficulty: it may be desired to only have a slot_member.value present on some slots and not others. This could be accommodated by importing a null-slot factory from a new slottools library:

from slottools import nullslot

class C:
    __slots__ = {"x": 1, "y": 2, "z": nullslot()}

assert not hasattr(C.z, "value")

The style of code suggested above would be a deviation from the dataclasses API. However, the slots machinery itself could even be altered to allow for this style of code, with accommodation of the dataclasses API specifically in mind:

class C:
    __slots__ = "x", "y", "z"
    x = 1  # 1 is stored on C.x.value
    y = 2  # 2 is stored on C.y.value

assert C.x.value == 1
assert C.y.value == y
assert not hasattr(C.z, "value")

Future possibility: "prepare" the class namespace inside the class body

The other possibility is altering/preparing (synonymous with the __prepare__ method of a metaclass) the class namespace.

Currently, there is no opportunity (other than writing a metaclass) to write code that alters the class namespace before the class object is instantiated, and the slots machinery goes to work. This could be changed by creating a hook for preparing the class namespace beforehand, and making it so that an error complaining about the conflicting names is only produced after that hook has been run.

This so-called __prepare_slots__ hook could look something like this, which I think is not too bad:

from dataclasses import dataclass, prepare_slots

@dataclass
class C:
    __slots__ = ('x',)
    __prepare_slots__ = prepare_slots
    x: int = field(default=1)

The dataclasses.prepare_slots function would simply be a function– similar to the __prepare__ method— that receives the class namespace and alters it before the class is created. For this case in particular, the default dataclass field values would be stored in some other convenient place so that they can be retrieved after the slot descriptor objects have been created.


* Note that the default field value conflicting with the slot might also be created by the dataclass machinery if dataclasses.field is being used.

Answered By: Rick

Following Rick Teachey‘s suggestion, I created a slotted_dataclass decorator. It can take, in keyword arguments, anything that you would specify after [field]: [type] = in a dataclass without __slots__ — both default values for fields and field(...). Specifying arguments that should go to old @dataclass constructor is also possible, but in dictionary object as a first positional argument. So this:

@dataclass(frozen=True)
class Test:
    a: dict = field(repr=False)
    b: int = 42
    c: list = field(default_factory=list)

would become:

@slotted_dataclass({'frozen': True}, a=field(repr=False), b=42, c=field(default_factory=list))
class Test:
    __slots__ = ('a', 'b', 'c')
    a: dict
    b: int
    c: list

And here is the source code of this new decorator:

def slotted_dataclass(dataclass_arguments=None, **kwargs):
    if dataclass_arguments is None:
        dataclass_arguments = {}

    def decorator(cls):
        old_attrs = {}

        for key, value in kwargs.items():
            old_attrs[key] = getattr(cls, key)
            setattr(cls, key, value)

        cls = dataclass(cls, **dataclass_arguments)
        for key, value in old_attrs.items():
            setattr(cls, key, value)
        return cls

    return decorator

Code explanation

The code above takes advantage of the fact that dataclasses module gets default field values by calling getattr on the class. That makes it possible to deliver our default values by replacing appropriate fields in the __dict__ of the class (which is done in the code by using setattr function). The class generated by the @dataclass decorator will be then completely identical to the class generated by specifying those after =, like we would if the class didn’t contain __slots__.

But since the __dict__ of the class with __slots__ contains member_descriptor objects:

>>> class C:
...     __slots__ = ('a', 'b', 'c')
...
>>> C.__dict__['a']
<member 'a' of 'C' objects>
>>> type(C.__dict__['a'])
<class 'member_descriptor'>

a nice thing to do is backup those objects and restore them after @dataclass decorator does its job, which is done in the code by using old_attrs dictionary.

Answered By: Anonymouse

The least involved solution I’ve found for this problem is to specify a custom __init__ using object.__setattr__ to assign values.

@dataclass(init=False, frozen=True)
class MyDataClass(object):
    __slots__ = (
        "required",
        "defaulted",
    )
    required: object
    defaulted: Optional[object]

    def __init__(
        self,
        required: object,
        defaulted: Optional[object] = None,
    ) -> None:
        super().__init__()
        object.__setattr__(self, "required", required)
        object.__setattr__(self, "defaulted", defaulted)

Answered By: mcguip

As noted already in the answers, data classes from dataclasses cannot generate slots for the simple reason that slots must be defined before a class is created.

In fact, the PEP for data classes explicitly mentions this:

At least for the initial release, __slots__ will not be supported. __slots__ needs to be added at class creation time. The Data Class decorator is called after the class is created, so in order to add __slots__ the decorator would have to create a new class, set __slots__, and return it. Because this behavior is somewhat surprising, the initial version of Data Classes will not support automatically setting __slots__.

I wanted to use slots because I needed to initialise many, many data class instances in another project. I ended up writing my own own alternative implementation of data classes which supports this, among a few extra features: dataclassy.

dataclassy uses a metaclass approach which has numerous advantages – it enables decorator inheritance, considerably reduced code complexity and of course, the generation of slots. With dataclassy the following is possible:

from dataclassy import dataclass

@dataclass(slots=True)
class Pet:
    name: str
    age: int
    species: str
    fluffy: bool = True

Printing Pet.__slots__ outputs the expected {'name', 'age', 'species', 'fluffy'}, instances have no __dict__ attribute and the overall memory footprint of the object is therefore lower. These observations indicate that __slots__ has been successfully generated and is effective. Plus, as evidenced, default values work just fine.

Answered By: biqqles

Another solution is to generate the slots parameter inside the class body, from the typed annotations.
this can look like:

@dataclass
class Client:
    first: str
    last: str
    age_of_signup: int
    
     __slots__ = slots(__annotations__)

where the slots function is:

def slots(anotes: Dict[str, object]) -> FrozenSet[str]:
    return frozenset(anotes.keys())

running that would generate a slots parameter that looks like:
frozenset({'first', 'last', 'age_of_signup})

This takes the annotations above it and makes a set of the specified names. The limitation here is you must re-type the __slots__ = slots(__annotations__) line for every class and it must be positioned below all the annotations and it does not work for annotations with default arguments.
This also has the advantage that the slots parameter will never conflict with the specified annotations so you can feel free to add or remove members and not worry about maintaining sperate lists.

Answered By: TG-Techie

In Python 3.10+ you can use slots=True with a dataclass to make it more memory-efficient:

from dataclasses import dataclass

@dataclass(slots=True)
class Point:
    x: int = 0
    y: int = 0

This way you can set default field values as well.

Answered By: Eugene Yarmash