How to implement a good __hash__ function in python

Question:

When implementing a class with multiple properties (like in the toy example below), what is the best way to handle hashing?

I guess that the __eq__ and __hash__ should be consistent, but how to implement a proper hash function that is capable of handling all the properties?

class AClass:
  def __init__(self):
      self.a = None
      self.b = None

  def __eq__(self, other):
      return other and self.a == other.a and self.b == other.b

  def __ne__(self, other):
    return not self.__eq__(other)

  def __hash__(self):
      return hash((self.a, self.b))

I read on this question that tuples are hashable, so I was wondering if something like the example above was sensible. Is it?

Asked By: abahgat

||

Answers:

Documentation for object.__hash__(self)

The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple. Example

def __hash__(self):
    return hash((self.name, self.nick, self.color))
Answered By: S.Lott

__hash__ should return the same value for objects that are equal. It also shouldn’t change over the lifetime of the object; generally you only implement it for immutable objects.

A trivial implementation would be to just return 0. This is always correct, but performs badly.

Your solution, returning the hash of a tuple of properties, is good. But note that you don’t need to list all properties that you compare in __eq__ in the tuple. If some property usually has the same value for inequal objects, just leave it out. Don’t make the hash computation any more expensive than it needs to be.

Edit: I would recommend against using xor to mix hashes in general. When two different properties have the same value, they will have the same hash, and with xor these will cancel eachother out. Tuples use a more complex calculation to mix hashes, see tuplehash in tupleobject.c.

Answered By: adw

It’s dangerous to write

def __eq__(self, other):
  return other and self.a == other.a and self.b == other.b

because if your rhs (i.e., other) object evaluates to boolean False, it will never compare as equal to anything!

In addition, you might want to double check if other belongs to the class or subclass of AClass. If it doesn’t, you’ll either get exception AttributeError or a false positive (if the other class happens to have the same-named attributes with matching values). So I would recommend to rewrite __eq__ as:

def __eq__(self, other):
  return isinstance(other, self.__class__) and self.a == other.a and self.b == other.b

If by any chance you want an unusually flexible comparison, which compares across unrelated classes as long as attributes match by name, you’d still want to at least avoid AttributeError and check that other doesn’t have any additional attributes. How you do it depends on the situation (since there’s no standard way to find all attributes of an object).

Answered By: max
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.