What are the best practices for __repr__ with collection class Python?

Question:

I have a custom Python class which essentially encapsulate a list of some kind of object, and I’m wondering how I should implement its __repr__ function. I’m tempted to go with the following:

class MyCollection:
   def __init__(self, objects = []):
      self._objects = []
      self._objects.extend(objects)

   def __repr__(self):
      return f"MyCollection({self._objects})"

This has the advantage of producing a valid Python output which fully describes the class instance. However, in my real-wold case, the object list can be rather large and each object may have a large repr by itself (they are arrays themselves).

What are the best practices in such situations? Accept that the repr might often be a very long string? Are there potential issues related to this (debugger UI, etc.)? Should I implement some kind of shortening scheme using semicolon? If so, is there a good/standard way to achieve this? Or should I skip listing the collection’s content altogether?

Asked By: abey

||

Answers:

The official documentation outlines this as how you should handle __repr__:

Called by the repr() built-in function to compute the “official”
string representation of an object. If at all possible, this should
look like a valid Python expression that could be used to recreate an
object with the same value (given an appropriate environment). If this
is not possible, a string of the form <…some useful description…>
should be returned. The return value must be a string object. If a
class defines __repr__() but not __str__(), then __repr__() is also
used when an “informal” string representation of instances of that
class is required.

This is typically used for debugging, so it is important that the
representation is information-rich and unambiguous.

Python 3 __repr__ Docs

Lists, strings, sets, tuples and dictionaries all print out the entirety of their collection in their __repr__ method.

Your current code looks to perfectly follow the example of what the documentation suggests. Though I would suggest changing your __init__ method so it looks more like this:

class MyCollection:
   def __init__(self, objects=None):
       if objects is None:
           objects = []
      self._objects = objects

   def __repr__(self):
      return f"MyCollection({self._objects})"

You generally want to avoid using mutable objects as default arguments. Technically because of the way your method is implemented using extend (which makes a copy of the list), it will still work perfectly fine, but Python’s documentation still suggests you avoid this.

It is good programming practice to not use mutable objects as default
values. Instead, use None as the default value and inside the
function, check if the parameter is None and create a new
list/dictionary/whatever if it is.

https://docs.python.org/3/faq/programming.html#why-are-default-values-shared-between-objects

If you’re interested in how another library handles it differently, the repr for Numpy arrays only shows the first three items and the last three items when the array length is greater than 1,000. It also formats the items so they all use the same amount of space (In the example below, 1000 takes up four spaces so 0 has to be padded with three more spaces to match).

>>> repr(np.array([i for i in range(1001)]))
'array([   0,    1,    2, ...,  998,  999, 1000])'

To mimic this numpy array style you could implement a __repr__ method like this in your class:

class MyCollection:
   def __init__(self, objects=None):
      if objects is None:
          objects = []
      self._objects = objects

   def __repr__(self):
       # If length is less than 1,000 return the full list.
      if len(self._objects) < 1000:
          return f"MyCollection({self._objects})"
      else:
          # Get the first and last three items
          items_to_display = self._objects[:3] + self._objects[-3:]
          # Find the which item has the longest repr
          max_length_repr = max(items_to_display, key=lambda x: len(repr(x)))
          # Get the length of the item with the longest repr
          padding = len(repr(max_length_repr))
          # Create a list of the reprs of each item and apply the padding
          values = [repr(item).rjust(padding) for item in items_to_display]
          # Insert the '...' inbetween the 3rd and 4th item
          values.insert(3, '...')
          # Convert the list to a string joined by commas
          array_as_string = ', '.join(values)
          return f"MyCollection([{array_as_string}])"

>>> repr(MyCollection([1,2,3,4]))
'MyCollection([1, 2, 3, 4])'

>>> repr(MyCollection([i for i in range(1001)]))
'MyCollection([   0,    1,    2, ...,  998,  999, 1000])'
          
Answered By: Nala Nkadi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.