Setting default/empty attributes for user classes in __init__

Question:

When I am creating a new class, should I set all instance attributes in __init__, even if they are None and in fact later assigned values in class methods?

See example below for the attribute results of MyClass:

class MyClass:
    def __init__(self,df):
          self.df = df
          self.results = None

    def results(df_results):
         #Imagine some calculations here or something
         self.results = df_results

I have found in other projects, class attributes can get buried when they only appear in class methods and there is a lot going.

So to an experienced professional programmer what is standard practice for this? Would you define all instance attributes in __init__ for readability?

Asked By: Andy

||

Answers:

To understand the importance(or not) of initializing attributes in __init__, let’s take a modified version of your class MyClass as an example. The purpose of the class is to compute the grade for a subject, given the student name and score. You may follow along in a Python interpreter.

>>> class MyClass:
...     def __init__(self,name,score):
...         self.name = name
...         self.score = score
...         self.grade = None
...
...     def results(self, subject=None):
...         if self.score >= 70:
...             self.grade = 'A'
...         elif 50 <= self.score < 70:
...             self.grade = 'B'
...         else:
...             self.grade = 'C'
...         return self.grade

This class requires two positional arguments name and score. These arguments must be provided to initialize a class instance. Without these, the class object x cannot be instantiated and a TypeError will be raised:

>>> x = MyClass()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() missing 2 required positional arguments: 'name' and 'score'

At this point, we understand that we must provide the name of the student and a score for a subject as a minimum, but the grade is not important right now because that will be computed later on, in the results method. So, we just use self.grade = None and don’t define it as a positional arg. Let’s initialize a class instance(object):

>>> x = MyClass(name='John', score=70)
>>> x
<__main__.MyClass object at 0x000002491F0AE898>

The <__main__.MyClass object at 0x000002491F0AE898> confirms that the class object x was successfully created at the given memory location. Now, Python provides some useful built-in methods to view the attributes of the created class object. One of the methods is __dict__. You can read more about it here:

>>> x.__dict__
{'name': 'John', 'score': 70, 'grade': None}

This clearly gives a dict view of all the initial attributes and their values. Notice, that grade has a None value as assigned in __init__.

Let’s take a moment to understand what __init__ does. There are many answers and online resources available to explain what this method does but I’ll summarize:

Like __init__, Python has another built-in method called __new__(). When you create a class object like this x = MyClass(name='John', score=70), Python internally calls __new__() first to create a new instance of the class MyClass and then calls __init__ to initialize the attributes name and score. Of course, in these internal calls when Python does not find the values for the required positional args, it raises an error as we’ve seen above. In other words, __init__ initializes the attributes. You can assign new initial values for name and score like this:

>>> x.__init__(name='Tim', score=50)
>>> x.__dict__
{'name': 'Tim', 'score': 50, 'grade': None}

It is also possible to access individual attributes like below. grade does not give anything because it is None.

>>> x.name
'Tim'
>>> x.score
50
>>> x.grade
>>>

In the results method, you will notice that the subject “variable” is defined as None, a positional arg. The scope of this variable is inside this method only. For the purposes of demonstration, I explicitly define subject inside this method but this could have been initialized in __init__ too. But what if I try to access it with my object:

>>> x.subject
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyClass' object has no attribute 'subject'

Python raises an AttributeError when it cannot locate an attribute within the class’s namespace. If you do not initialize attributes in __init__, there is a possibility to encounter this error when you access an undefined attribute that could be local to the method of a class only. In this example, defining subject inside __init__ would have avoided the confusion and would’ve been perfectly normal to do so as it is not required for any computation either.

Now, lets call results and see what we get:

>>> x.results()
'B'
>>> x.__dict__
{'name': 'Tim', 'score': 50, 'grade': 'B'}

This prints the grade for the score and notice when we view the attributes, the grade has also been updated. Right from the start, we had a clear view of the initial attributes and how their values have changed.

But what about subject? If I want to know how much Tim scored in Math and what was the grade, I can easily access the score and the grade as we’ve seen before but how do I know the subject? Since, the subject variable is local to the scope of the results method we could just return the value of subject. Change the return statement in the results method:

def results(self, subject=None):
    #<---code--->
    return self.grade, subject

Let’s call results() again. We get a tuple with the grade and subject as expected.

>>> x.results(subject='Math')
('B', 'Math')

To access the values in the tuple, let’s assign them to variables. In Python, it is possible to assign values from a collection to multiple variables in the same expression, provided that the number of variables is equal to the length of the collection. Here, the length is just two, so we can have two variables to the left of the expression:

>>> grade, subject = x.results(subject='Math')
>>> subject
'Math'

So, there we have it, though it needed a few extra lines of code to get the subject. It would be more intuitive to access all of them at once using just the dot operator to access the attributes with x.<attribute>, but this is just an example and you could try it with subject initialized in __init__.

Next, consider there are many students(say 3) and we want the names, scores, grades for Math. Except the subject, all others must be some sort of a collection data type like a list that can store all the names, scores and grades. We could just initialize like this:

>>> x = MyClass(name=['John', 'Tom', 'Sean'], score=[70, 55, 40])
>>> x.name
['John', 'Tom', 'Sean']
>>> x.score
[70, 55, 40]

This seems fine at first sight, but when you take a another look(or some other programmer) at the initialization of name, score and grade in __init__, there is no way to tell that they need a collection data type. The variables are also named singular making it more obvious that they could be just some random variables that may need just one value. The purpose of programmers should be to make the intent as clear as as possible, by way of descriptive variable naming, type declarations, code comments and so on. With this in mind, let’s change the attribute declarations in __init__. Before we settle for a well-behaved, well-defined declaration, we must take care of how we declare default arguments.


Edit: Problems with mutable default arguments:

Now, there are some ‘gotchas’ that we must be aware of while declaring default args. Consider the following declaration that initializes names and appends a random name on object creation. Recall that lists are mutable objects in Python.

#Not recommended
class MyClass:
    def __init__(self,names=[]):
        self.names = names
        self.names.append('Random_name')

Let’s see what happens when we create objects from this class:

>>> x = MyClass()
>>> x.names
['Random_name']
>>> y = MyClass()
>>> y.names
['Random_name', 'Random_name']

The list continues to grow with every new object creation. The reason behind this is that the default values are always evaluated whenever __init__ is called. Calling __init__ multiple times, keeps using the same function object thus appending to the previous set of default values. You can verify this yourself as the id remains the same for every object creation.

>>> id(x.names)
2513077313800
>>> id(y.names)
2513077313800

So, what is the correct way of defining default args while also being explicit about the data type the attribute supports? The safest option is to set default args to None and initialize to an empty list when the arg values are None. The following is a recommended way to declare default args:

#Recommended
>>> class MyClass:
...     def __init__(self,names=None):
...         self.names = names if names else []
...         self.names.append('Random_name')

Let’s examine the behavior:

>>> x = MyClass()
>>> x.names
['Random_name']
>>> y = MyClass()
>>> y.names
['Random_name']

Now, this behavior is what we are looking for. The object does not “carry over” old baggage and re-initializes to an empty list whenever no values are passed to names. If we pass some valid names (as a list of course) to the names arg for the y object, Random_name will simply be appended to this list. And again, the x object values will not be affected:

>>> y = MyClass(names=['Viky','Sam'])
>>> y.names
['Viky', 'Sam', 'Random_name']
>>> x.names
['Random_name']

Perhaps, the most simplest explanation on this concept can also be found on the Effbot website. If you’d like to read some excellent answers: “Least Astonishment” and the Mutable Default Argument.


Based on the brief discussion on default args, our class declarations will be modified to:

class MyClass:
    def __init__(self,names=None, scores=None):
        self.names = names if names else []
        self.scores = scores if scores else []
        self.grades = []
#<---code------>

This makes more sense, all variables have plural names and initialized to empty lists on object creation. We get similar results as before:

>>> x.names
['John', 'Tom', 'Sean']
>>> x.grades
[]

grades is an empty list making it clear that the grades will be computed for multiple students when results() is called. Therefore, our results method should also be modified. The comparisons that we make should now be between the score numbers(70, 50 etc.) and items in the self.scores list and while it does that the self.grades list should also be updated with the individual grades. Change the results method to:

def results(self, subject=None):
    #Grade calculator 
    for i in self.scores:
        if i >= 70:
            self.grades.append('A')
        elif 50 <= i < 70:
            self.grades.append('B')
        else:
            self.grades.append('C')
    return self.grades, subject

We should now get the grades as a list when we call results():

>>> x.results(subject='Math')
>>> x.grades
['A', 'B', 'C']
>>> x.names
['John', 'Tom', 'Sean']
>>> x.scores
[70, 55, 40]

This looks good but imagine if the lists were large and to figure out who’s score/grade belongs to whom would be an absolute nightmare. This is where it is important to initialize the attributes with the correct data type that can store all of these items in a way that they are easily accessible as well as clearly show their relationships. The best choice here is a dictionary.

We can have a dictionary with names and scores defined initially and the results function should put together everything into a new dictionary that has all the scores, grades etc. We should also comment the code properly and explicitly define args in the method wherever possible. Lastly, we may not require self.grades anymore in __init__ because as you will see the grades are not being appended to a list but explicitly assigned. This is totally dependent upon the requirements of the problem.

The final code:

class MyClass:
"""A class that computes the final results for students"""

    def __init__(self,names_scores=None):

        """initialize student names and scores
        :param names_scores: accepts key/value pairs of names/scores
                         E.g.: {'John': 70}"""

        self.names_scores = names_scores if names_scores else {}     

    def results(self, _final_results={}, subject=None):
        """Assign grades and collect final results into a dictionary.

       :param _final_results: an internal arg that will store the final results as dict. 
                              This is just to give a meaningful variable name for the final results."""

        self._final_results = _final_results
        for key,value in self.names_scores.items():
            if value >= 70:
                self.names_scores[key] = [value,subject,'A']
            elif 50 <= value < 70:
                self.names_scores[key] = [value,subject,'B']
            else:
                self.names_scores[key] = [value,subject,'C']
        self._final_results = self.names_scores #assign the values from the updated names_scores dict to _final_results
        return self._final_results

Please note _final_results is just an internal arg that stores the updated dict self.names_scores. The purpose is to return a more meaningful variable from the function that clearly informs the intent. The _ in the beginning of this variable indicates that it is an internal variable, as per convention.

Lets give this a final run:

>>> x = MyClass(names_scores={'John':70, 'Tom':50, 'Sean':40})
>>> x.results(subject='Math')  

  {'John': [70, 'Math', 'A'],
 'Tom': [50, 'Math', 'B'],
 'Sean': [40, 'Math', 'C']}

This gives a much clearer view of the results for each student. It is now easy to access the grades/scores for any student:

>>> y = x.results(subject='Math')
>>> y['John']
[70, 'Math', 'A']

Conclusion:

While the final code needed some extra hard work but it was worth it. The output is more precise and gives clear information about each students’ results. The code is more readable and clearly informs the reader about the intent of creating the class, methods, & variables. The following are the key takeaways from this discussion:

  • The variables(attributes) that are expected to be shared amongst class methods, should be defined in __init__. In our example, names, scores and possibly subject were required by results(). These attributes could be shared by another method like say average that computes the average of the scores.
  • The attributes should be initialized with the appropriate data type. This should be decided before-hand before venturing into a class-based design for a problem.
  • Care must be taken while declaring attributes with default args. Mutable default args can mutate the values of the attribute if the enclosing __init__ is causing mutation of the attribute on every call. It is safest to declare default args as None and re-initialize to an empty mutable collection later whenever the default value is None.
  • The attribute names should be unambiguous, follow PEP8 guidelines.
  • Some variables should be initialized within the scope of the class method only. These could be, for example, internal variables that are required for computations or variables that don’t need to be shared with other methods.
  • Another compelling reason to define variables in __init__ is to avoid possible AttributeErrors that may occur due to accessing unnamed/out-of-scope attributes. The __dict__ built-in method provides a view of the attributes initialized here.
  • While assigning values to attributes(positional args) on class instantiation, the attribute names should be explicitly defined. For instance:

    x = MyClass('John', 70)  #not explicit
    x = MyClass(name='John', score=70) #explicit
    
  • Finally, the aim should be to communicate the intent as clearly as possible with comments. The class, its methods and attributes should be well commented. For all attributes, a short description alongwith an example, is quite useful for a new programmer who encounters your class and its attributes for the first time.

Answered By: amanb

I think you should avoid both solutions. Simply because you should avoid to create uninitialized or partially initialized objects, except in one case I will outline later.

Look at two slightly modified version of your class, with a setter and a getter:

class MyClass1:
    def __init__(self, df):
          self.df = df
          self.results = None

    def set_results(self, df_results):
         self.results = df_results

    def get_results(self):
         return self.results

And

class MyClass2:
    def __init__(self, df):
          self.df = df

    def set_results(self, df_results):
         self.results = df_results

    def get_results(self):
         return self.results

The only difference between MyClass1 and MyClass2 is that the first one initializes results in the constructor while the second does it in set_results. Here comes the user of your class (usually you, but not always). Everyone knows you can’t trust the user (even if it’s you):

MyClass1("df").get_results()
# returns None

Or

MyClass2("df").get_results()
# Traceback (most recent call last):
# ...
# AttributeError: 'MyClass2' object has no attribute 'results'

You might think that the first case is better because it does not fail, but I do not agree. I would like the program to fail fast in this case, rather than do a long debugging session to find what happened. Hence, the first part of first answer is: do not set the uninitialized fields to None, because you loose a fail-fast hint.

But that’s not the whole answer. Whichever version you choose, you have an issue: the object was not used and it shouldn’t have been, because it was not fully initialized. You can add a docstring to get_results: """Always use set_results **BEFORE** this method""". Unfortunately the user doesn’t read docstrings either.

You have two main reasons for uninitialized fields in your object: 1. you don’t know (for now) the value of the field; 2. you want to avoid an expansive operation (computation, file access, network, …), aka "lazy initialization". Both situations are met in real world, and collide the need of using only fully initialized objects.

Happily, there is a well documented solution to this problem: Design Patterns, and more precisely Creational patterns. In your case, the Factory pattern or the Builder pattern might be the answer. E.g.:

class MyClassBuilder:
    def __init__(self, df):
          self._df = df # df is known immediately
          # GIVE A DEFAULT VALUE TO OTHER FIELDS to avoid the possibility of a partially uninitialized object.
          # The default value should be either:
          # * a value passed as a parameter of the constructor ;
          # * a sensible value (eg. an empty list, 0, etc.)

    def results(self, df_results):
         self._results = df_results
         return self # for fluent style
         
    ... other field initializers

    def build(self):
        return MyClass(self._df, self._results, ...)

class MyClass:
    def __init__(self, df, results, ...):
          self.df = df
          self.results = results
          ...
          
    def get_results(self):
         return self.results
    
    ... other getters
         

(You can use a Factory too, but I find the Builder more flexible). Let’s give a second chance to the user:

>>> b = MyClassBuilder("df").build()
Traceback (most recent call last):
...
AttributeError: 'MyClassBuilder' object has no attribute '_results'
>>> b = MyClassBuilder("df")
>>> b.results("r")
... other fields iniialization
>>> x = b.build()
>>> x
<__main__.MyClass object at ...>
>>> x.get_results()
'r'

The advantages are clear:

  1. It’s easier to detect and fix a creation failure than a late use failure;
  2. You do not release in the wild a uninitialized (and thus potentially damaging) version of your object.

The presence of uninitialized fields in the Builder is not a contradiction: those fields are uninitialized by design, because the Builder’s role is to initialize them. (Actually, those fields are some kind of forein fields to the Builder.) This is the case I was talking about in my introduction. They should, in my mind, be set to a default value (if it exists) or left uninitialized to raise an exception if you try to create an uncomplete object.

Second part of my answer: use a Creational pattern to ensure the object is correctly initialized.

Side note: I’m very suspicious when I see a class with getters and setters. My rule of thumb is: always try to separate them because when they meet, objects become unstable.

Answered By: jferard

Following considerable research and discussions with experienced programmers please see below what I believe is the most Pythonic solution to this question. I have included the updated code first and then a narrative:

class MyClass:
    def __init__(self,df):
          self.df = df
          self._results = None

    @property
    def results(self):
        if self._results is None:
            raise Exception('df_client is None')
        return self._results

    def generate_results(self, df_results):
         #Imagine some calculations here or something
         self._results = df_results

Description of what I learnt, changed and why:

  1. All class attributes should be included in the __init__ (initialiser) method. This is to ensure readability and aid debugging.

  2. The first issue is that you cannot create private attributes in Python. Everything is public, so any partially initialised attributes (such as results being set to None) can be accessed. Convention to indicate a private attribute is to place a lead underscore at the front, so in this case I changed it to self.results to self._results.

    Keep in mind this is only convention, and self._results can still be directly accessed. However, this is the Pythonic way to handle what are pseudo-private attributes.

  3. The second issue is having a partly initialised attribute which is set to None. As this is set to None, as @jferard below explains, we now have lost a fail-fast hint and have added a layer of obfuscation for debugging the code.

    To resolve this we add a getter method. This can be seen above as the function results() which has the @property decorator above.

    This is a function that when invoked checks if self._results is None. If so it will raise an exception (fail-safe hint), otherwise it will return the object. The @property decorator changes the invocation style from a function to an attribute, so all the user has to use on an instance of MyClass is .results just like any other attribute.

    (I changed the name of the method that sets the results to generate_results() to avoid confusion and free up .results for the getter method)

  4. If you then have other methods within the class that need to use self._results, but only when properly assigned, you can use self.results, and that way the fail-safe hint is baked in as above.

I recommend also reading @jferard’s answer to this question. He goes into depth about the problems and some of the solutions. The reason I added my answer is that I think for a lot of cases the above is all you need (and the Pythonic way of doing it).

Answered By: Andy

It’s good practice to set sane default values in most applications (this solves errors with possible missing values) – so you only have to worry about data validation.

In python 3.7+ you can use dataclasses to set default values. Python creates default special methods under the hood so the class is easy to read.

It’s also good practice to write & comment your code so it can be easily followed by others.

In an app which reads user config from yaml I used a variation of this answer to solve possible missing configuration values:

class Settings():

   def __init__(self):
      """ read values from the 'Default' dataclass &
          subsequently overwrite with values from YAML.
      """
      # set default values
      self.set_defaults()

      # overwrite defaults with values from yaml
      config = self.get_config()

      # read a dict into class attributes
      for key, value in config.items():
         setattr(self, key, value)


   def set_defaults(self):
      """ sets default application values from dataclass
      """
      for name, field in   self.Default.__dataclass_fields__.items():
         setattr(self, name, field.default)


    # subclass with default values
    # dataclasses require python 3.7
    @dataclass
    class Default:
       """ Stores default values for the app.
           Called by main class: 'Settings'
       """
       cache_dir: bool = False
       cleanup: bool = True
       .....


    def get_config(self):
       """ read config file """
       ...

In the final code I also made the main class a singleton as only one copy of the object needs to exist to store configuration settings. Credit to this answer for inspiration.

Answered By: Stuart Cardall