How can I make a python dataclass hashable?

Question:

Say a I have a dataclass in python3. I want to be able to hash and order these objects.

I do not want frozen=True. I want mutable objects. While I did not have this on the original question, I absolutely want the objects of this DataClass to be mutable. Some of the answers over the years focus on frozen and were not useful for me. I’m sure these answers will be useful for someone.

I only want them ordered/hashed on id.

I see in the docs that I can just implement _hash_ and all that but I’d like to get datacalsses to do the work for me because they are intended to handle this.

from dataclasses import dataclass, field

@dataclass(eq=True, order=True)
class Category:
    id: str = field(compare=True)
    name: str = field(default="set this in post_init", compare=False)

a = sorted(list(set([ Category(id='x'), Category(id='y')])))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
Asked By: Brian C.

||

Answers:

TL;DR

Use frozen=True in conjunction to eq=True (which will make the instances immutable).

Long Answer

From the docs:

__hash__() is used by built-in hash(), and when objects are added to hashed collections such as dictionaries and sets. Having a __hash__()
implies that instances of the class are immutable. Mutability is a
complicated property that depends on the programmer’s intent, the
existence and behavior of __eq__(), and the values of the eq and
frozen flags in the dataclass() decorator.

By default, dataclass() will not implicitly add a __hash__() method
unless it is safe to do so. Neither will it add or change an existing
explicitly defined __hash__() method. Setting the class attribute
__hash__ = None has a specific meaning to Python, as described in the __hash__() documentation.

If __hash__() is not explicit defined, or if it is set to None, then
dataclass() may add an implicit __hash__() method. Although not
recommended, you can force dataclass() to create a __hash__() method
with unsafe_hash=True. This might be the case if your class is
logically immutable but can nonetheless be mutated. This is a
specialized use case and should be considered carefully.

Here are the rules governing implicit creation of a __hash__() method.
Note that you cannot both have an explicit __hash__() method in your
dataclass and set unsafe_hash=True; this will result in a TypeError.

If eq and frozen are both true, by default dataclass() will generate a
__hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left
untouched meaning the __hash__() method of the superclass will be used
(if the superclass is object, this means it will fall back to id-based
hashing).

Answered By: DeepSpace

From the docs:

Here are the rules governing implicit creation of a __hash__() method:

[…]

If eq and frozen are both true, by default dataclass() will
generate a __hash__() method for you. If eq is true and frozen
is false, __hash__() will be set to None, marking it unhashable
(which it is, since it is mutable). If eq is false, __hash__()
will be left untouched meaning the __hash__() method of the
superclass will be used (if the superclass is object, this means it
will fall back to id-based hashing).

Since you set eq=True and left frozen at the default (False), your dataclass is unhashable.

You have 3 options:

  • Set frozen=True (in addition to eq=True), which will make your class immutable and hashable.
  • Set unsafe_hash=True, which will create a __hash__ method but leave your class mutable, thus risking problems if an instance of your class is modified while stored in a dict or set:

    cat = Category('foo', 'bar')
    categories = {cat}
    cat.id = 'baz'
    
    print(cat in categories)  # False
    
  • Manually implement a __hash__ method.
Answered By: Aran-Fey

I’d like to add a special note for use of unsafe_hash.

You can exclude fields from being compared by hash by setting compare=False, or hash=False. (hash by default inherits from compare).

This might be useful if you store nodes in a graph but want to mark them visited without breaking their hashing (e.g if they’re in a set of unvisited nodes..).

from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
    x:int
    visit_count: int = field(default=10, compare=False)  # hash inherits compare setting. So valid.
    # visit_count: int = field(default=False, hash=False)   # also valid. Arguably easier to read, but can break some compare code.
    # visit_count: int = False   # if mutated, hashing breaks. (3* printed)

s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
    print("2* n still in s")
else:
    print("3* n is lost to the void because hashing broke.")

This took me hours to figure out… Useful further readings I found is the python doc on dataclasses. Specifically see the field documentation and dataclass arg documentations.
https://docs.python.org/3/library/dataclasses.html

Answered By: Leo Ufimtsev