Python: Mutables in class states, what's going on under the hood?
Question:
I’m currently doing an online course on OOP in Python and one of the skill tests was to write a password generator class. Below you’ll find the recommended answer.
import re
import random
from string import ascii_letters, punctuation
from copy import copy
class Password:
SAMPLES = {
"letters": list(ascii_letters),
"numbers": list(range(10)),
"punctuation": list(punctuation)
}
DEFAULT_SETTINGS = {
"low": 8,
"mid": 12,
"high": 16
}
@classmethod
def show_input_universe(cls):
return cls.SAMPLES
def _generate_password(self):
population = self.SAMPLES["letters"]
length = self.length or self.DEFAULT_SETTINGS.get(self.strength)
if self.strength == "high":
population += self.SAMPLES["numbers"] + self.SAMPLES["punctuation"]
elif self.strength == "mid":
population += self.SAMPLES["numbers"]
else:
pass
# map(lambda x: str(x), random...)
self.password = "".join(map(str, random.choices(population, k=int(length))))
def __init__(self, strength="mid", length=None): #None so there's no need for a value
self.strength = strength
self.length = length
self._generate_password()
When using that code to create instances of that class and thus generate passwords that are either of high or mid security the underlying SAMPLES["letter] list is modified. That leads to low strength passwords sharing all the properties of the other password strenghts.
I could even see that effect when calling the show_input_universe method. The other lists were added to the original list.
But why is that?
I understand that population = self.SAMPLES["letters"]
creates the population variable which stores a pointer to the self.SAMPLES["letters"] list.
But then how exactly does the concatenation work:
if self.strength == "high":
population += self.SAMPLES["numbers"] + self.SAMPLES["punctuation"]
The solution to this is the following:
from copy import copy
population = Copy(self.SAMPLES["letters"])
As that creates a copy of the initial List that is only modified within the specific instances.
Answers:
The line:
population += self.SAMPLES["numbers"] + self.SAMPLES["punctuation"]
does not just rebind the population
variable, it mutates the object that population
currently references, which is self.SAMPLES["letters"]
. It’s the same as doing:
population.extend(self.SAMPLES["numbers"] + self.SAMPLES["punctuation"])
or:
self.SAMPLES["letters"].extend(self.SAMPLES["numbers"] + self.SAMPLES["punctuation"])
Note that you don’t need to import anything special to make a copy of a list; you can just use the list
constructor to make a new list out of any iterable (including another list):
population = list(self.SAMPLES["letters"])
or you can use the built-in list.copy
method:
population = self.SAMPLES["letters"].copy()
The problem is here:
population = self.SAMPLES["letters"]
In Python, variables are references that point to a chunk of data*. A reference in this case is like a sticky note with a name written on it. You can attach several of them to the same chunk of data, and access that same chunk of data by any of the names of the variables that point to it. Assigning something to a different variable never** makes a copy.
Thus, the population
variable is just a reference to the exact same list as self.SAMPLES["letters"]
. So when you mutate population
, you are also mutating self.SAMPLES["letters"]
. This is why making a copy fixes the problem: because now they are two separate lists, so mutating one does not mutate the other.
As shown in the other answer, you can also make a copy of the list by invoking list()
on it, or indexing with [:]
. All of these are equivalent:
population = list(self.SAMPLES["letters"])
population = self.SAMPLES["letters"][:]
from copy import copy
population = copy(self.SAMPLES["letters"])
*There are some exceptions in specific cases, but this is the correct mental model in general.
**Again, exceptions are possible, but none that are relevant here.
The type of population is list
. List implements the datamodel hook __iadd__
for in-place sequence concatenation. This method also returns the existing instance:
>>> population = [0, 1]
>>> new = population.__iadd__([2, 3])
>>> new is population
True
>>> population
[0, 1, 2, 3]
So, the augmented assignment statement +=
mutates the original list, which is one of the values of the dict in the class namespace. It’s in Password.SAMPLES
– shared between all instances. Even though population
is a local variable inside your method, there is still shared state.
Note that if list did not implement __iadd__
, then your code would work as you expected because the augmented assigment would fall back to using __add__
, concatenating and returning a new instance. Replace population += other
with population = population + other
to see similar.
I’m currently doing an online course on OOP in Python and one of the skill tests was to write a password generator class. Below you’ll find the recommended answer.
import re
import random
from string import ascii_letters, punctuation
from copy import copy
class Password:
SAMPLES = {
"letters": list(ascii_letters),
"numbers": list(range(10)),
"punctuation": list(punctuation)
}
DEFAULT_SETTINGS = {
"low": 8,
"mid": 12,
"high": 16
}
@classmethod
def show_input_universe(cls):
return cls.SAMPLES
def _generate_password(self):
population = self.SAMPLES["letters"]
length = self.length or self.DEFAULT_SETTINGS.get(self.strength)
if self.strength == "high":
population += self.SAMPLES["numbers"] + self.SAMPLES["punctuation"]
elif self.strength == "mid":
population += self.SAMPLES["numbers"]
else:
pass
# map(lambda x: str(x), random...)
self.password = "".join(map(str, random.choices(population, k=int(length))))
def __init__(self, strength="mid", length=None): #None so there's no need for a value
self.strength = strength
self.length = length
self._generate_password()
When using that code to create instances of that class and thus generate passwords that are either of high or mid security the underlying SAMPLES["letter] list is modified. That leads to low strength passwords sharing all the properties of the other password strenghts.
I could even see that effect when calling the show_input_universe method. The other lists were added to the original list.
But why is that?
I understand that population = self.SAMPLES["letters"]
creates the population variable which stores a pointer to the self.SAMPLES["letters"] list.
But then how exactly does the concatenation work:
if self.strength == "high":
population += self.SAMPLES["numbers"] + self.SAMPLES["punctuation"]
The solution to this is the following:
from copy import copy
population = Copy(self.SAMPLES["letters"])
As that creates a copy of the initial List that is only modified within the specific instances.
The line:
population += self.SAMPLES["numbers"] + self.SAMPLES["punctuation"]
does not just rebind the population
variable, it mutates the object that population
currently references, which is self.SAMPLES["letters"]
. It’s the same as doing:
population.extend(self.SAMPLES["numbers"] + self.SAMPLES["punctuation"])
or:
self.SAMPLES["letters"].extend(self.SAMPLES["numbers"] + self.SAMPLES["punctuation"])
Note that you don’t need to import anything special to make a copy of a list; you can just use the list
constructor to make a new list out of any iterable (including another list):
population = list(self.SAMPLES["letters"])
or you can use the built-in list.copy
method:
population = self.SAMPLES["letters"].copy()
The problem is here:
population = self.SAMPLES["letters"]
In Python, variables are references that point to a chunk of data*. A reference in this case is like a sticky note with a name written on it. You can attach several of them to the same chunk of data, and access that same chunk of data by any of the names of the variables that point to it. Assigning something to a different variable never** makes a copy.
Thus, the population
variable is just a reference to the exact same list as self.SAMPLES["letters"]
. So when you mutate population
, you are also mutating self.SAMPLES["letters"]
. This is why making a copy fixes the problem: because now they are two separate lists, so mutating one does not mutate the other.
As shown in the other answer, you can also make a copy of the list by invoking list()
on it, or indexing with [:]
. All of these are equivalent:
population = list(self.SAMPLES["letters"])
population = self.SAMPLES["letters"][:]
from copy import copy
population = copy(self.SAMPLES["letters"])
*There are some exceptions in specific cases, but this is the correct mental model in general.
**Again, exceptions are possible, but none that are relevant here.
The type of population is list
. List implements the datamodel hook __iadd__
for in-place sequence concatenation. This method also returns the existing instance:
>>> population = [0, 1]
>>> new = population.__iadd__([2, 3])
>>> new is population
True
>>> population
[0, 1, 2, 3]
So, the augmented assignment statement +=
mutates the original list, which is one of the values of the dict in the class namespace. It’s in Password.SAMPLES
– shared between all instances. Even though population
is a local variable inside your method, there is still shared state.
Note that if list did not implement __iadd__
, then your code would work as you expected because the augmented assigment would fall back to using __add__
, concatenating and returning a new instance. Replace population += other
with population = population + other
to see similar.