Python dataclasses.dataclass reference to variable instead of instance variable
Question:
The default values in constructors for c1
and c2
should produce new instance variables for b
and b
. Instead, it looks like c1.a
and c2.a
are referencing the same variable. Is @dataclass
creating a class variable? That does not seem to be consistent with the intended functionality, and I cannot find anything about class variables in the documentation. So, I think this is a bug. Can someone explain to me how to fix it? Should I report it as a bug on the python tracker?
I know this issue must be related to the way python passes objects by reference and built-in types by value since the b
attribute (which is just a float) shows the expected/desired behavior while the a
attribute (which is a user-defined object) is just a reference.
Thanks!
Code:
from dataclasses import dataclass
@dataclass
class VS:
v: float # value
s: float # scale factor
def scaled_value(self):
return self.v*self.s
@dataclass
class Container:
a: VS = VS(1, 1)
b: float = 1
c1 = Container()
c2 = Container()
print(c1)
print(c2)
c1.a.v = -999
c1.b = -999
print(c1)
print(c2)
Ouputs:
Container(a=VS(v=1, s=1), b=1)
Container(a=VS(v=1, s=1), b=1)
Container(a=VS(v=-999, s=1), b=-999)
Container(a=VS(v=-999, s=1), b=1)
Answers:
Thanks Eric S for providing an explanation:
c1 and c2 share the same instance of a. This is the mutable default argument problem: https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
Use a default_factory to create a new VS for each container.
The default_factory does not allow me to have a unique set of default VS values for multiple attributes since the VS defaults will need to be defined in the VS dataclass. For example, if I wanted a to default to VS(1,1) but I wanted b to default to VS(1,2), the default_factory does not help me. So, I found I workaround, which is to create a dict of keyword entries and pass a deepcopy into my Container() constructer (note, that if I do not pass a deep copy, I get the same issue as above). Here is my final code snippet and the output:
Code:
from dataclasses import dataclass, field
from copy import deepcopy
@dataclass
class VS:
v: float = 1 # value
s: float = 1 # scale factor
def scaled_value(self):
return self.v*self.s
@dataclass
class Container:
a: VS = field(default_factory=VS)
b: float = 1
ip = {'a':VS(2,1),'b':1}
c1 = Container(**deepcopy(ip))
c2 = Container(**deepcopy(ip))
print(c1)
print(c2)
c1.a.v = 0
c1.b = 0
print(c1)
print(c2)
Output:
Container(a=VS(v=2, s=1), b=1)
Container(a=VS(v=2, s=1), b=1)
Container(a=VS(v=0, s=1), b=0)
Container(a=VS(v=2, s=1), b=1)
In the OP’s original example, a single VS
object is created when the Container
class is defined. That object is then shared across all instances of the Container
class. This is a problem because user-defined classes such as VS
result in a mutable objects. Thus, changing a
in any Container
object will change a
in all other Container
objects
You want to generate a new VS object every time a Container
class is instantiated at initialization time. To do this, using the default_factory
of the field
function is a good way to go about it. Passing a lambda function allows all this to be done inline.
I added a c
member variable to container with another VS
class to illustrate that the members are independent when done this way.
from dataclasses import dataclass, field
@dataclass
class VS:
v: float # value
s: float # scale factor
def scaled_value(self):
return self.v*self.s
# Use a zero-argument lambda function for default factor function.
@dataclass
class Container:
a: VS = field(default_factory= lambda:VS(1,1) )
b: float = 1
c: VS = field(default_factory= lambda:VS(1,2) )
c1 = Container()
c2 = Container()
print(c1)
print(c2)
c1.a.v = -999
c1.c.s = -999
print(c1)
print(c2)
Output:
Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))
Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))
Container(a=VS(v=-999, s=1), b=1, c=VS(v=1, s=-999))
Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))
The default values in constructors for c1
and c2
should produce new instance variables for b
and b
. Instead, it looks like c1.a
and c2.a
are referencing the same variable. Is @dataclass
creating a class variable? That does not seem to be consistent with the intended functionality, and I cannot find anything about class variables in the documentation. So, I think this is a bug. Can someone explain to me how to fix it? Should I report it as a bug on the python tracker?
I know this issue must be related to the way python passes objects by reference and built-in types by value since the b
attribute (which is just a float) shows the expected/desired behavior while the a
attribute (which is a user-defined object) is just a reference.
Thanks!
Code:
from dataclasses import dataclass
@dataclass
class VS:
v: float # value
s: float # scale factor
def scaled_value(self):
return self.v*self.s
@dataclass
class Container:
a: VS = VS(1, 1)
b: float = 1
c1 = Container()
c2 = Container()
print(c1)
print(c2)
c1.a.v = -999
c1.b = -999
print(c1)
print(c2)
Ouputs:
Container(a=VS(v=1, s=1), b=1)
Container(a=VS(v=1, s=1), b=1)
Container(a=VS(v=-999, s=1), b=-999)
Container(a=VS(v=-999, s=1), b=1)
Thanks Eric S for providing an explanation:
c1 and c2 share the same instance of a. This is the mutable default argument problem: https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
Use a default_factory to create a new VS for each container.
The default_factory does not allow me to have a unique set of default VS values for multiple attributes since the VS defaults will need to be defined in the VS dataclass. For example, if I wanted a to default to VS(1,1) but I wanted b to default to VS(1,2), the default_factory does not help me. So, I found I workaround, which is to create a dict of keyword entries and pass a deepcopy into my Container() constructer (note, that if I do not pass a deep copy, I get the same issue as above). Here is my final code snippet and the output:
Code:
from dataclasses import dataclass, field
from copy import deepcopy
@dataclass
class VS:
v: float = 1 # value
s: float = 1 # scale factor
def scaled_value(self):
return self.v*self.s
@dataclass
class Container:
a: VS = field(default_factory=VS)
b: float = 1
ip = {'a':VS(2,1),'b':1}
c1 = Container(**deepcopy(ip))
c2 = Container(**deepcopy(ip))
print(c1)
print(c2)
c1.a.v = 0
c1.b = 0
print(c1)
print(c2)
Output:
Container(a=VS(v=2, s=1), b=1)
Container(a=VS(v=2, s=1), b=1)
Container(a=VS(v=0, s=1), b=0)
Container(a=VS(v=2, s=1), b=1)
In the OP’s original example, a single VS
object is created when the Container
class is defined. That object is then shared across all instances of the Container
class. This is a problem because user-defined classes such as VS
result in a mutable objects. Thus, changing a
in any Container
object will change a
in all other Container
objects
You want to generate a new VS object every time a Container
class is instantiated at initialization time. To do this, using the default_factory
of the field
function is a good way to go about it. Passing a lambda function allows all this to be done inline.
I added a c
member variable to container with another VS
class to illustrate that the members are independent when done this way.
from dataclasses import dataclass, field
@dataclass
class VS:
v: float # value
s: float # scale factor
def scaled_value(self):
return self.v*self.s
# Use a zero-argument lambda function for default factor function.
@dataclass
class Container:
a: VS = field(default_factory= lambda:VS(1,1) )
b: float = 1
c: VS = field(default_factory= lambda:VS(1,2) )
c1 = Container()
c2 = Container()
print(c1)
print(c2)
c1.a.v = -999
c1.c.s = -999
print(c1)
print(c2)
Output:
Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))
Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))
Container(a=VS(v=-999, s=1), b=1, c=VS(v=1, s=-999))
Container(a=VS(v=1, s=1), b=1, c=VS(v=1, s=2))