Using dataclasses.MISSING as optional parameter value with a Python dataclass?
Question:
I want to make the set
parameter optional but still allow None
to be a valid value. Based on the documentation, it suggested that dataclasses.MISSING
could be used a default value to assist in this.
As shown above, the MISSING
value is a sentinel object used to detect
if some parameters are provided by the user. This sentinel is used
because None
is a valid value for some parameters with a distinct
meaning. No code should directly use the MISSING
value.
But by using this as follows:
import dataclasses
from dataclasses import dataclass, field
@dataclass
class Var:
get: list
set: list = dataclasses.MISSING
def __post_init__(self):
if self.set is dataclasses.MISSING:
self.set = self.get
print(Var(get=['Title']))
I am getting an error:
Traceback (most recent call last):
File "main.py", line 31, in <module>
print(Var(get=['Title']))
TypeError: __init__() missing 1 required positional argument: 'set'
Answers:
I don’t know if you can use dataclasses.MISSING
in this way, so I would simply use a dedicated enum
. Since it’s an enum
, it’s gauranteed to be identical only with itself, so it should give use what you want:
from dataclasses import dataclass
from enum import Enum
_field_status = Enum("FieldStatus", "UNSET")
@dataclass
class Var:
get: list
set: list = _field_status.UNSET
def __post_init__(self):
if self.set is _field_status.UNSET:
self.set = self.get
print(Var(get = [7]))
print(Var(get=[7], set=[8]))
print(Var(get=[7], set=None))
Obviously this prevents the user from setting set
to _field_status.UNSET
, but presumably they don’t need to do that.
Note that I am slightly confused as to why None
is a valid value for something which is hinted as a list, but the principle stands.
No code should directly use the MISSING value.
This part above is noted in the docs for a reason. Therefore, we should avoid use of the MISSING
usage (and import) in our application code if possible. In this case, using MISSING
is not at all applicable to our use case.
Assumed usage (by avoiding directly working with the MISSING
sentinel value, and instead using dataclasses.field(...)
:
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class Var:
get: list[str]
set: Optional[list[str]] = field(default=None)
print(Var(get=['Title']))
# Var(get=['Title'], set=None)
But where is MISSING
actually used?
The MISSING
is a sentinel object that the dataclasses
module uses behind the scenes, for its magic.
You can inspect the source code for dataclasses.field
and find a clear usage of it there:
def field(*, default=MISSING, default_factory=MISSING, init=True, repr=True,
hash=None, compare=True, metadata=None):
You’ll see the declared default value for fields such as default
is default=MISSING
instead of default=None
. This is done mainly to identify if the user actually passes in a value for default
or default_factory
to the factory function fields
. For example, it’s perfectly valid to pass field(default=None)
as we did in example above; however, since the default value is actually MISSING
instead, dataclasses
is able to detect that a value has been passed for this parameter (a value of None
).
How MISSING
is declared
If you inspect the source code of the dataclasses
module, or by Ctrl
(Command
on Mac) + left clicking on the keyword MISSING
anywhere in code, you can see how MISSING
is actually declared:
# A sentinel object to detect if a parameter is supplied or not. Use
# a class to give it a better repr.
class _MISSING_TYPE:
pass
MISSING = _MISSING_TYPE()
A Potential Solution
Piggy-backing off how the dataclasses
module defines MISSING
, you could theoretically define your own (empty) class, and then instantiate the class.
I, however, feel that class instantiation could be obviated in this scenario. Here’s a one-liner to create a sentinel class / type:
class _UNSET: ... # here the ellipsis (`...`) is essentially the same as `pass`
Usage, then, would be as follows:
from __future__ import annotations
from dataclasses import dataclass
class _UNSET: ...
@dataclass
class Var:
get: list
set: list | None = _UNSET
def __post_init__(self):
if self.set is _UNSET:
self.set = self.get
print(Var(get=[7]))
print(Var(get=[7], set=[8]))
print(Var(get=[7], set=None))
This correctly distinguishes between scenarios when set
is omitted in the constructor, or when a value for set
is specified, such as set=None
.
The result, is also as expected:
Var(get=[7], set=[7])
Var(get=[7], set=[8])
Var(get=[7], set=None)
I want to make the set
parameter optional but still allow None
to be a valid value. Based on the documentation, it suggested that dataclasses.MISSING
could be used a default value to assist in this.
As shown above, the
MISSING
value is a sentinel object used to detect
if some parameters are provided by the user. This sentinel is used
becauseNone
is a valid value for some parameters with a distinct
meaning. No code should directly use theMISSING
value.
But by using this as follows:
import dataclasses
from dataclasses import dataclass, field
@dataclass
class Var:
get: list
set: list = dataclasses.MISSING
def __post_init__(self):
if self.set is dataclasses.MISSING:
self.set = self.get
print(Var(get=['Title']))
I am getting an error:
Traceback (most recent call last):
File "main.py", line 31, in <module>
print(Var(get=['Title']))
TypeError: __init__() missing 1 required positional argument: 'set'
I don’t know if you can use dataclasses.MISSING
in this way, so I would simply use a dedicated enum
. Since it’s an enum
, it’s gauranteed to be identical only with itself, so it should give use what you want:
from dataclasses import dataclass
from enum import Enum
_field_status = Enum("FieldStatus", "UNSET")
@dataclass
class Var:
get: list
set: list = _field_status.UNSET
def __post_init__(self):
if self.set is _field_status.UNSET:
self.set = self.get
print(Var(get = [7]))
print(Var(get=[7], set=[8]))
print(Var(get=[7], set=None))
Obviously this prevents the user from setting set
to _field_status.UNSET
, but presumably they don’t need to do that.
Note that I am slightly confused as to why None
is a valid value for something which is hinted as a list, but the principle stands.
No code should directly use the MISSING value.
This part above is noted in the docs for a reason. Therefore, we should avoid use of the MISSING
usage (and import) in our application code if possible. In this case, using MISSING
is not at all applicable to our use case.
Assumed usage (by avoiding directly working with the MISSING
sentinel value, and instead using dataclasses.field(...)
:
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class Var:
get: list[str]
set: Optional[list[str]] = field(default=None)
print(Var(get=['Title']))
# Var(get=['Title'], set=None)
But where is MISSING
actually used?
The MISSING
is a sentinel object that the dataclasses
module uses behind the scenes, for its magic.
You can inspect the source code for dataclasses.field
and find a clear usage of it there:
def field(*, default=MISSING, default_factory=MISSING, init=True, repr=True,
hash=None, compare=True, metadata=None):
You’ll see the declared default value for fields such as default
is default=MISSING
instead of default=None
. This is done mainly to identify if the user actually passes in a value for default
or default_factory
to the factory function fields
. For example, it’s perfectly valid to pass field(default=None)
as we did in example above; however, since the default value is actually MISSING
instead, dataclasses
is able to detect that a value has been passed for this parameter (a value of None
).
How MISSING
is declared
If you inspect the source code of the dataclasses
module, or by Ctrl
(Command
on Mac) + left clicking on the keyword MISSING
anywhere in code, you can see how MISSING
is actually declared:
# A sentinel object to detect if a parameter is supplied or not. Use
# a class to give it a better repr.
class _MISSING_TYPE:
pass
MISSING = _MISSING_TYPE()
A Potential Solution
Piggy-backing off how the dataclasses
module defines MISSING
, you could theoretically define your own (empty) class, and then instantiate the class.
I, however, feel that class instantiation could be obviated in this scenario. Here’s a one-liner to create a sentinel class / type:
class _UNSET: ... # here the ellipsis (`...`) is essentially the same as `pass`
Usage, then, would be as follows:
from __future__ import annotations
from dataclasses import dataclass
class _UNSET: ...
@dataclass
class Var:
get: list
set: list | None = _UNSET
def __post_init__(self):
if self.set is _UNSET:
self.set = self.get
print(Var(get=[7]))
print(Var(get=[7], set=[8]))
print(Var(get=[7], set=None))
This correctly distinguishes between scenarios when set
is omitted in the constructor, or when a value for set
is specified, such as set=None
.
The result, is also as expected:
Var(get=[7], set=[7])
Var(get=[7], set=[8])
Var(get=[7], set=None)