Make Python dataclass iterable?

Question:

I have a dataclass and I want to iterate over in in a loop to spit out each of the values. I’m able to write a very short __iter__() within it easy enough, but is that what I should be doing? I don’t see anything in the documentation about an ‘iterable’ parameter or anything, but I just feel like there ought to be…

Here is what I have which, again, works fine.

from dataclasses import dataclass

@dataclass
class MyDataClass:
    a: float
    b: float
    c: float

    def __iter__(self):
        for value in self.__dict__.values():
            yield value

thing = MyDataclass(1,2,3)
for i in thing:
    print(i)
# outputs 1,2,3 on separate lines, as expected

Is this the best / most direct way to do this?

Asked By: scotscotmcc

||

Answers:

Just use dataclasses.asdict to get a dictionary.

In [28]: from dataclasses import asdict
In [29]: [v for v in asdict(MyDataClass(1, 2, 3)).values()]
Out[29]: [1, 2, 3]

Then you can also access the attributes if you use .items().

In [30]: [(k, v) for k, v in asdict(MyDataClass(1, 2, 3)).items()]
Out[30]: [('a', 1), ('b', 2), ('c', 3)]
Answered By: suvayu

The simplest approach is probably to make a iteratively extract the fields following the guidance in the dataclasses.astuple function for creating a shallow copy, just omitting the call to tuple (to leave it a generator expression, which is a legal iterator for __iter__ to return:

def __iter__(self):
    return (getattr(self, field.name) for field in dataclasses.fields(self))

# Or writing it directly as a generator itself instead of returning a genexpr:
def __iter__(self):
    for field in dataclasses.fields(self):
        yield getattr(self, field.name)

Unfortunately, astuple itself is not suitable (as it recurses, unpacking nested dataclasses and structures), while asdict (followed by a .values() call on the result), while suitable, involves eagerly constructing a temporary dict and recursively copying the contents, which is relatively heavyweight (memory-wise and CPU-wise); better to avoid unnecessary O(n) eager work.

asdict would be suitable if you want/need to avoid using live views (if later attributes of the instance are replaced/modified midway through iterating, asdict wouldn’t change, since it actually guarantees they’re deep copied up-front, while the genexpr would reflect the newer values when you reached them). The implementation using asdict is even simpler (if slower, due to the eager pre-deep copy):

def __iter__(self):
    yield from dataclasses.asdict(self).values()

# or avoiding a generator function:
def __iter__(self):
    return iter(dataclasses.asdict(self).values())

There is a third option, which is to ditch dataclasses entirely. If you’re okay with making your class behave like an immutable sequence, then you get iterability for free by making it a typing.NamedTuple (or the older, less flexible collections.namedtuple) instead, e.g.:

from typing import NamedTuple

class MyNotADataClass(NamedTuple):
    a: float
    b: float
    c: float

thing = MyNotADataClass(1,2,3)
for i in thing:
    print(i)
# outputs 1,2,3 on separate lines, as expected

and that is iterable automatically (you can also call len on it, index it, or slice it, because it’s an actual subclass of tuple with all the tuple behaviors, it just also exposes its contents via named properties as well).

Answered By: ShadowRanger
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.