what is field metadata used for in python dataclasses

Question:

I have been reading through documentation for python dataclasses and other web pages. The field metadata is read-only and the documentation says:

It is not used at all by Data Classes and is provided as a third-party extension mechanism

I’m confused by how third-party extensions are going to use "a value from a read-only mapping". This rather seem like something that previously would go into the software documentation?

My intention is to decorate my field, so that I know if a specific value (part of the field) is computed yet or not, I don’t think this is something that can be achieved through metadata, right?

Asked By: armen

||

Answers:

Some libraries do actually take advantage of metadata. For example, marshmallow-dataclass, a very popular dataclass validation library, allows you to install custom validator methods and maybe some other stuff by using the metadata hook in a dataclass you define yourself. I’d imagine that other libraries that want to extend the usability of dataclasses can take advantage of it too.

Yeah, your guess that you don’t want to / can’t use metadata for keeping track of whether a field is computed or not sounds correct to me. I think there are some hacks you can do to get what you want done, but it sounds like you want to have a class which is more stateful than dataclasses and the metadata capability of their fields "ought" to be.

Since the mapping proxy is read-only, you don’t have much ability to update it after you first create it when instantiating your dataclass. The nimblest way to work around the read-only limitation of the metadata field is to have the metadata field itself point to something outside of the scope of the dataclass instance itself. In this case, the read-only wrapper is around WHICH thing you’re pointing to, NOT what’s in the thing. OR, you could also update the entire metadata field for that class yourself at runtime (this is allowed because it’s the contents of the metadata which are read-only, not the metadata field itself).
But if you’re creating it more anonymously you are not able to change what the metadata field is like after creation time.

For example:

from dataclasses import dataclass, field, fields


external_metadata_dict = {'is_initialized': False}
other_external_metadata_dict = {'foo': 1}

@dataclass
class UsesExternalDict:
    variable: int = field(metadata=external_metadata_dict)


example1 = UsesExternalDict(0)
print(f'example1 initial metadata: {fields(example1)[0].metadata}')
example2 = UsesExternalDict(0)
print(f'example2 initial metadata: {fields(example2)[0].metadata}')

# update the thing example1 and example2 both point to, even though their metadata is in a read-only wrapper
external_metadata_dict['is_initialized'] = True
print(f'example1 updated metadata: {fields(example1)[0].metadata}')
print(f'example2 updated metadata: {fields(example2)[0].metadata}')

# directly modifying the 'metadata' field also allowed
example3 = UsesExternalDict(0)
fields(example3)[0].metadata = other_external_metadata_dict

gives

example1 initial metadata: {'is_initialized': False}
example2 initial metadata: {'is_initialized': False}
example1 updated metadata: {'is_initialized': True}
example2 updated metadata: {'is_initialized': True}
example3 initial metadata: {'is_initialized': True}
example3 updated metadata: {'foo': 1}

but in most programming situations, you’re most likely to want to do this:

from dataclasses import dataclass, field, fields


@dataclass
class UsesInternalDict:
    variable: int = field(metadata={'is_initialized': False})

example = UsesInternalDict(0)
print(f'example initial metadata: {fields(example)[0].metadata}')

fields(example)[0].metadata['is_initialized'] = True
print(f'example updated metadata: {fields(example)[0].metadata}')

just gives

example initial metadata: {'is_initialized': False}
Traceback (most recent call last):
  File "/home/sinback/scrap.py", line 11, in <module>
    fields(example)[0].metadata['is_initialized'] = True
TypeError: 'mappingproxy' object does not support item assignment

since you’re trying to update the actually read-only part of the dataclass.

There’s nothing stopping you from writing a method of your dataclass which manually updates the metadata part of one of its fields at runtime, in one of the manners I demonstrated in the first example, like:

@dataclass
class UsesExternalDict:
    variable: int = field(metadata=external_metadata_dict)

    def messwith(self):
        self.__dataclass_fields__['variable'].metadata = other_external_metadata_dict

but… it’s really not stylistically in keeping with how dataclasses are supposed to work. And it’s a really fragile and gross solution even putting aside the style concerns – it takes advantage of the everything-is-a-reference language feature of Python which causes programmers to write bugs all of the time. It’ll totally lead to confusion down the line if you’re working with anyone else.

Another option you could explore is to have the field you’re keeping track of itself be its own class which keeps track of that metadata by itself. To me this feels like the best bet for you, but it’s up to you.

Answered By: sinback