Why is dataclass field shared across instances

Question

First time using dataclass, also not really good at Python. The following behaviour conflicts with my understanding so far:

from dataclasses import dataclass

@dataclass
class X:
  x: int = 1
  y: int = 2

@dataclass
class Y:
  c1: X = X(3, 4)
  c2: X = X(5, 6)

n1 = Y()
n2 = Y()

print(id(n1.c1))
print(id(n2.c1))

n1.c1.x = 99999
print(n2)

This prints

140459664164272
140459664164272
Y(c1=X(x=99999, y=4), c2=X(x=5, y=6))

Why does c1 behave like a class variable? What can I do to keep n2.c1 != n1.c1, do I need to write an init function?

I can get sensible results with this addition to Y:

  def __init__(self):
   self.c1 = X(3, 4)
   self.c2 = X(5, 6)

prints:

140173334359840
140173335445072
Y(c1=X(x=3, y=4), c2=X(x=5, y=6))

Asked By: perreal

||

Source

Answer 1

Why does c1 behave like a class variable?

Because you specified default value for them and they’re now a class attribute. In the Mutable Default Values section, it’s mentioned:

Python stores default member variable values in class attributes.

But look at this:

@dataclass
class X:
    x: int = 1
    y: int = 2

@dataclass
class Y:
    c1: X
    c2: X = X(5, 6)

print("c1" in Y.__dict__)  # False
print("c2" in Y.__dict__)  # True

c1 doesn’t have default value so it’s not in class’s namespace.

Indeed by doing so(defining default value), Python stores that c1 and c2 inside both instance’s namespace (n1.__dict__) and class’s namespace (Y.__dict__). Those are the same objects, only the reference is passed:

@dataclass
class X:
    x: int = 1
    y: int = 2

@dataclass
class Y:
    c1: X = X(3, 4)
    c2: X = X(5, 6)

n1 = Y()
n2 = Y()

print("c1" in Y.__dict__)  # True
print("c1" in n1.__dict__)  # True

print(id(n1.c1)) # 140037361903232
print(id(n2.c1)) # 140037361903232
print(id(Y.c1))  # 140037361903232

So now, If you want them to be different you have several options:

Pass arguments while instantiating (Not a good one):

@dataclass
class X:
    x: int = 1
    y: int = 2

@dataclass
class Y:
    c1: X = X(3, 4)
    c2: X = X(5, 6)

n1 = Y(X(3, 4), X(5, 6))
n2 = Y(X(3, 4), X(5, 6))

print("c1" in Y.__dict__)  # True
print("c1" in n1.__dict__)  # True

print(id(n1.c1)) # 140058585069264
print(id(n2.c1)) # 140058584543104
print(id(Y.c1))  # 140058585065088

Use field and pass default_factory:

from dataclasses import dataclass, field

@dataclass
class X:
    x: int = 1
    y: int = 2

@dataclass
class Y:
    c1: X = field(default_factory=lambda: X(3, 4))
    c2: X = field(default_factory=lambda: X(5, 6))

n1 = Y()
n2 = Y()

print("c1" in Y.__dict__)   # False
print("c1" in n1.__dict__)  # True

print(id(n1.c1))  # 140284815353136
print(id(n2.c1))  # 140284815353712

In the second option, because I didn’t specify default parameter(you can’t mix both), nothing is going to be stored in the class’s namespace. field(default=SOMETHING) is another way of saying = SOMETHING.

Answered By: S.B

Why is dataclass field shared across instances

Question:

Answers: