How can I make a python dataclass hashable?
Question:
Say a I have a dataclass in python3. I want to be able to hash and order these objects.
I do not want frozen=True
. I want mutable objects. While I did not have this on the original question, I absolutely want the objects of this DataClass to be mutable. Some of the answers over the years focus on frozen
and were not useful for me. I’m sure these answers will be useful for someone.
I only want them ordered/hashed on id.
I see in the docs that I can just implement _hash_ and all that but I’d like to get datacalsses to do the work for me because they are intended to handle this.
from dataclasses import dataclass, field
@dataclass(eq=True, order=True)
class Category:
id: str = field(compare=True)
name: str = field(default="set this in post_init", compare=False)
a = sorted(list(set([ Category(id='x'), Category(id='y')])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
Answers:
TL;DR
Use frozen=True
in conjunction to eq=True
(which will make the instances immutable).
Long Answer
From the docs:
__hash__()
is used by built-in hash()
, and when objects are added to hashed collections such as dictionaries and sets. Having a __hash__()
implies that instances of the class are immutable. Mutability is a
complicated property that depends on the programmer’s intent, the
existence and behavior of __eq__()
, and the values of the eq and
frozen flags in the dataclass()
decorator.
By default, dataclass()
will not implicitly add a __hash__()
method
unless it is safe to do so. Neither will it add or change an existing
explicitly defined __hash__()
method. Setting the class attribute
__hash__ = None
has a specific meaning to Python, as described in the __hash__()
documentation.
If __hash__()
is not explicit defined, or if it is set to None, then
dataclass()
may add an implicit __hash__()
method. Although not
recommended, you can force dataclass()
to create a __hash__()
method
with unsafe_hash=True
. This might be the case if your class is
logically immutable but can nonetheless be mutated. This is a
specialized use case and should be considered carefully.
Here are the rules governing implicit creation of a __hash__()
method.
Note that you cannot both have an explicit __hash__()
method in your
dataclass and set unsafe_hash=True
; this will result in a TypeError
.
If eq and frozen are both true, by default dataclass()
will generate a
__hash__()
method for you. If eq is true and frozen is false, __hash__()
will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__()
will be left
untouched meaning the __hash__()
method of the superclass will be used
(if the superclass is object, this means it will fall back to id-based
hashing).
From the docs:
Here are the rules governing implicit creation of a __hash__()
method:
[…]
If eq
and frozen
are both true, by default dataclass()
will
generate a __hash__()
method for you. If eq
is true and frozen
is false, __hash__()
will be set to None
, marking it unhashable
(which it is, since it is mutable). If eq
is false, __hash__()
will be left untouched meaning the __hash__()
method of the
superclass will be used (if the superclass is object, this means it
will fall back to id-based hashing).
Since you set eq=True
and left frozen
at the default (False
), your dataclass is unhashable.
You have 3 options:
- Set
frozen=True
(in addition to eq=True
), which will make your class immutable and hashable.
-
Set unsafe_hash=True
, which will create a __hash__
method but leave your class mutable, thus risking problems if an instance of your class is modified while stored in a dict or set:
cat = Category('foo', 'bar')
categories = {cat}
cat.id = 'baz'
print(cat in categories) # False
- Manually implement a
__hash__
method.
I’d like to add a special note for use of unsafe_hash.
You can exclude fields from being compared by hash by setting compare=False, or hash=False. (hash by default inherits from compare).
This might be useful if you store nodes in a graph but want to mark them visited without breaking their hashing (e.g if they’re in a set of unvisited nodes..).
from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
x:int
visit_count: int = field(default=10, compare=False) # hash inherits compare setting. So valid.
# visit_count: int = field(default=False, hash=False) # also valid. Arguably easier to read, but can break some compare code.
# visit_count: int = False # if mutated, hashing breaks. (3* printed)
s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
print("2* n still in s")
else:
print("3* n is lost to the void because hashing broke.")
This took me hours to figure out… Useful further readings I found is the python doc on dataclasses. Specifically see the field documentation and dataclass arg documentations.
https://docs.python.org/3/library/dataclasses.html
Say a I have a dataclass in python3. I want to be able to hash and order these objects.
I do not want frozen=True
. I want mutable objects. While I did not have this on the original question, I absolutely want the objects of this DataClass to be mutable. Some of the answers over the years focus on frozen
and were not useful for me. I’m sure these answers will be useful for someone.
I only want them ordered/hashed on id.
I see in the docs that I can just implement _hash_ and all that but I’d like to get datacalsses to do the work for me because they are intended to handle this.
from dataclasses import dataclass, field
@dataclass(eq=True, order=True)
class Category:
id: str = field(compare=True)
name: str = field(default="set this in post_init", compare=False)
a = sorted(list(set([ Category(id='x'), Category(id='y')])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
TL;DR
Use frozen=True
in conjunction to eq=True
(which will make the instances immutable).
Long Answer
From the docs:
__hash__()
is used by built-inhash()
, and when objects are added to hashed collections such as dictionaries and sets. Having a__hash__()
implies that instances of the class are immutable. Mutability is a
complicated property that depends on the programmer’s intent, the
existence and behavior of__eq__()
, and the values of the eq and
frozen flags in thedataclass()
decorator.By default,
dataclass()
will not implicitly add a__hash__()
method
unless it is safe to do so. Neither will it add or change an existing
explicitly defined__hash__()
method. Setting the class attribute
__hash__ = None
has a specific meaning to Python, as described in the__hash__()
documentation.If
__hash__()
is not explicit defined, or if it is set to None, then
dataclass()
may add an implicit__hash__()
method. Although not
recommended, you can forcedataclass()
to create a__hash__()
method
withunsafe_hash=True
. This might be the case if your class is
logically immutable but can nonetheless be mutated. This is a
specialized use case and should be considered carefully.Here are the rules governing implicit creation of a
__hash__()
method.
Note that you cannot both have an explicit__hash__()
method in your
dataclass and setunsafe_hash=True
; this will result in aTypeError
.If eq and frozen are both true, by default
dataclass()
will generate a
__hash__()
method for you. If eq is true and frozen is false,__hash__()
will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false,__hash__()
will be left
untouched meaning the__hash__()
method of the superclass will be used
(if the superclass is object, this means it will fall back to id-based
hashing).
From the docs:
Here are the rules governing implicit creation of a
__hash__()
method:[…]
If
eq
andfrozen
are both true, by defaultdataclass()
will
generate a__hash__()
method for you. Ifeq
is true andfrozen
is false,__hash__()
will be set toNone
, marking it unhashable
(which it is, since it is mutable). Ifeq
is false,__hash__()
will be left untouched meaning the__hash__()
method of the
superclass will be used (if the superclass is object, this means it
will fall back to id-based hashing).
Since you set eq=True
and left frozen
at the default (False
), your dataclass is unhashable.
You have 3 options:
- Set
frozen=True
(in addition toeq=True
), which will make your class immutable and hashable. -
Set
unsafe_hash=True
, which will create a__hash__
method but leave your class mutable, thus risking problems if an instance of your class is modified while stored in a dict or set:cat = Category('foo', 'bar') categories = {cat} cat.id = 'baz' print(cat in categories) # False
- Manually implement a
__hash__
method.
I’d like to add a special note for use of unsafe_hash.
You can exclude fields from being compared by hash by setting compare=False, or hash=False. (hash by default inherits from compare).
This might be useful if you store nodes in a graph but want to mark them visited without breaking their hashing (e.g if they’re in a set of unvisited nodes..).
from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
x:int
visit_count: int = field(default=10, compare=False) # hash inherits compare setting. So valid.
# visit_count: int = field(default=False, hash=False) # also valid. Arguably easier to read, but can break some compare code.
# visit_count: int = False # if mutated, hashing breaks. (3* printed)
s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
print("2* n still in s")
else:
print("3* n is lost to the void because hashing broke.")
This took me hours to figure out… Useful further readings I found is the python doc on dataclasses. Specifically see the field documentation and dataclass arg documentations.
https://docs.python.org/3/library/dataclasses.html