Pydantic how to access updated values for PrivateAttr
Question:
I have this class
from pydantic import BaseModel, PrivateAttr
class A(BaseModel):
_value: str = PrivateAttr()
def __init__(self):
self._value = "Initial Value"
super().__init__()
@property
def value(self):
return self._value
@value.setter
def value(self, some_value: str):
self._value = some_value
def clone(self):
return self.copy(update={"_value": "Updated Value"})
When I instantiate one instance and make a copy with a value update, I get the updated value in the copy but when I access the updated value, I still get the initial old value. How can I get the updated value here?
a = A()
b = a.clone()
print([a, b])
print(a.value)
print(b.value)
For which I get the output
[A(), A(_value='Updated Value')]
Initial Value
Initial Value
Answers:
TL;DR
Private attributes are special and different from fields. If you want to properly assign a new value to a private attribute, you need to do it via regular attribute setting:
...
def clone(self):
obj = self.copy()
obj._value = "Updated value"
return obj
Long story: __slots__
, __dict__
, descriptors & other shenanigans
This an interesting edge case here.
I analyzed this for quite a while and I have to admit that I am still missing one crucial piece, which probably goes pretty deep into the Python data model internals. But please bear with me for a bit of context.
Private attributes stored in __slots__
!
The docstring on the PrivateAttr
function1 tells us a crucial bit of information:
Private attrs are stored in model __slots__
.
Looking at the ModelMetaclass
, we can see the way this is done during class creation2.
What happens during copy
?
I’ll reduce and adjust your example code a bit to better illustrate the points I am about to make:
from pydantic import BaseModel, PrivateAttr
class A(BaseModel):
_value: str = PrivateAttr(default="Initial Value")
id: int
if __name__ == '__main__':
a1 = A(id=1)
a2 = a1.copy(update={"id": 2, "_value": "Updated value"})
print(a1.__dict__)
print(a2.__dict__)
When we call copy
on instance a1
, a dictionary mapping field names to values is set up to be assigned to the __dict__
of the new instance a2
. In this case, we additionally pass update={"_value": "Updated Value"}
, which is "merged" into that dictionary first.
The output we get:
{'id': 1}
{'id': 2, '_value': 'Updated value'}
Note that for the original object a1
that you want to copy, the __dict__
actually only contains the id
. Its private _value
attribute is not stored there, it is (as mentioned before) stored in the __slots__
. Yet during creation of the copy a2
you added "_value"
to its __dict__
.
However, the copy
method does one more thing (via the protected _copy_and_set_values
method3) after the __dict__
is set: It goes over all the private attributes of the "source" instance and copies them over to the newly created instance, but it does so via the regular old __setattr__
, and not by directly updating the __dict__
.
What this means is that we end up with two distinct values associated with _value
on the new object. One "Updated value"
in its __dict__
under the "_value"
key and the other "Initial value"
(as taken from the source object) being stored in the __slots__
.
Precedence in attribute lookup
The documentation’s __slots__
notes tells us another crucial piece of information:
__slots__
are implemented at the class level by creating descriptors for each variable name.
Descriptors hold a special place in the attribute lookup logic. What this means is that regular attribute access to a name in the __slots__
will actually first try and get the value from there and thus never even gets to the __dict__
of the object. There is a really well summarized explanation of this process touching on the precedence of descriptors in this answer to a different question:
When does Python fall back onto class __dict__ from instance __dict__?
In practice, this means that appending print(a2._value)
to the script from above gives the output Initial Value
.
Why exactly the order of attribute lookup is this way is what I don’t fully understand. I assume one would have to check the actual CPython source for that.
But the fact remains that accessing _value
will check the slots descriptor and thus ignore what has been copied into the object’s __dict__
.
Why is new value in the object representation?
As you may have guessed by now, the __repr__
of a model actually iterates through its __dict__
to grab the key-value-pairs to print. And when there is a key there that is not a field (as is the case with _value
), it is always included in the string representation.4
This is why doing print(a1); print(a2)
will give the following output:
id=1
id=2 _value='Updated value'
Side note: Avoid properties setters here!
This has nothing to do with the main issue here, but just fyi Pydnatic has problems with property setters. This was discussed here and is still relevant. Depending on your use case, you may want to utilize one of the workarounds other people posted there.
But I would strongly suggest that you consider using a normal field instead of a private attribute, if you want to set it from the outside anyway. Private attributes should probably be for internal use and not mutated from the outside. I struggle to see a use case for this, but you do you.
Conclusion
To set _value
on the new object, you should not use the update
parameter, but you need to do it the old fashioned way:
from pydantic import BaseModel, PrivateAttr
class A(BaseModel):
_value: str = PrivateAttr(default="Initial value")
id: int
@property
def value(self):
return self._value
def clone(self):
obj = self.copy(update={"id": self.id + 1})
obj._value = "Updated value"
return obj
if __name__ == '__main__':
a1 = A(id=1)
a2 = a1.clone()
print([a1, a2])
print(a1.value)
print(a2.value)
Output:
[A(id=1), A(id=2)]
Initial value
Updated value
I have this class
from pydantic import BaseModel, PrivateAttr
class A(BaseModel):
_value: str = PrivateAttr()
def __init__(self):
self._value = "Initial Value"
super().__init__()
@property
def value(self):
return self._value
@value.setter
def value(self, some_value: str):
self._value = some_value
def clone(self):
return self.copy(update={"_value": "Updated Value"})
When I instantiate one instance and make a copy with a value update, I get the updated value in the copy but when I access the updated value, I still get the initial old value. How can I get the updated value here?
a = A()
b = a.clone()
print([a, b])
print(a.value)
print(b.value)
For which I get the output
[A(), A(_value='Updated Value')]
Initial Value
Initial Value
TL;DR
Private attributes are special and different from fields. If you want to properly assign a new value to a private attribute, you need to do it via regular attribute setting:
...
def clone(self):
obj = self.copy()
obj._value = "Updated value"
return obj
Long story: __slots__
, __dict__
, descriptors & other shenanigans
This an interesting edge case here.
I analyzed this for quite a while and I have to admit that I am still missing one crucial piece, which probably goes pretty deep into the Python data model internals. But please bear with me for a bit of context.
Private attributes stored in __slots__
!
The docstring on the PrivateAttr
function1 tells us a crucial bit of information:
Private attrs are stored in model
__slots__
.
Looking at the ModelMetaclass
, we can see the way this is done during class creation2.
What happens during copy
?
I’ll reduce and adjust your example code a bit to better illustrate the points I am about to make:
from pydantic import BaseModel, PrivateAttr
class A(BaseModel):
_value: str = PrivateAttr(default="Initial Value")
id: int
if __name__ == '__main__':
a1 = A(id=1)
a2 = a1.copy(update={"id": 2, "_value": "Updated value"})
print(a1.__dict__)
print(a2.__dict__)
When we call copy
on instance a1
, a dictionary mapping field names to values is set up to be assigned to the __dict__
of the new instance a2
. In this case, we additionally pass update={"_value": "Updated Value"}
, which is "merged" into that dictionary first.
The output we get:
{'id': 1}
{'id': 2, '_value': 'Updated value'}
Note that for the original object a1
that you want to copy, the __dict__
actually only contains the id
. Its private _value
attribute is not stored there, it is (as mentioned before) stored in the __slots__
. Yet during creation of the copy a2
you added "_value"
to its __dict__
.
However, the copy
method does one more thing (via the protected _copy_and_set_values
method3) after the __dict__
is set: It goes over all the private attributes of the "source" instance and copies them over to the newly created instance, but it does so via the regular old __setattr__
, and not by directly updating the __dict__
.
What this means is that we end up with two distinct values associated with _value
on the new object. One "Updated value"
in its __dict__
under the "_value"
key and the other "Initial value"
(as taken from the source object) being stored in the __slots__
.
Precedence in attribute lookup
The documentation’s __slots__
notes tells us another crucial piece of information:
__slots__
are implemented at the class level by creating descriptors for each variable name.
Descriptors hold a special place in the attribute lookup logic. What this means is that regular attribute access to a name in the __slots__
will actually first try and get the value from there and thus never even gets to the __dict__
of the object. There is a really well summarized explanation of this process touching on the precedence of descriptors in this answer to a different question:
When does Python fall back onto class __dict__ from instance __dict__?
In practice, this means that appending print(a2._value)
to the script from above gives the output Initial Value
.
Why exactly the order of attribute lookup is this way is what I don’t fully understand. I assume one would have to check the actual CPython source for that.
But the fact remains that accessing _value
will check the slots descriptor and thus ignore what has been copied into the object’s __dict__
.
Why is new value in the object representation?
As you may have guessed by now, the __repr__
of a model actually iterates through its __dict__
to grab the key-value-pairs to print. And when there is a key there that is not a field (as is the case with _value
), it is always included in the string representation.4
This is why doing print(a1); print(a2)
will give the following output:
id=1
id=2 _value='Updated value'
Side note: Avoid properties setters here!
This has nothing to do with the main issue here, but just fyi Pydnatic has problems with property setters. This was discussed here and is still relevant. Depending on your use case, you may want to utilize one of the workarounds other people posted there.
But I would strongly suggest that you consider using a normal field instead of a private attribute, if you want to set it from the outside anyway. Private attributes should probably be for internal use and not mutated from the outside. I struggle to see a use case for this, but you do you.
Conclusion
To set _value
on the new object, you should not use the update
parameter, but you need to do it the old fashioned way:
from pydantic import BaseModel, PrivateAttr
class A(BaseModel):
_value: str = PrivateAttr(default="Initial value")
id: int
@property
def value(self):
return self._value
def clone(self):
obj = self.copy(update={"id": self.id + 1})
obj._value = "Updated value"
return obj
if __name__ == '__main__':
a1 = A(id=1)
a2 = a1.clone()
print([a1, a2])
print(a1.value)
print(a2.value)
Output:
[A(id=1), A(id=2)]
Initial value
Updated value