How can I make a dataclass hash the same as a string?

Question:

I want to replace string keys in dictionaries in my code with a dataclass so that I can provide meta data to the keys for debugging. However, I still want to be able to use a string to lookup dictionaries. I tried implementing a data-class with a replaced __hash__ function, however my code is not working as expected:

from dataclasses import dataclass

@dataclass(eq=True, frozen=True)
class Key:
    name: str

    def __hash__(self):
        return hash(self.name)

k = "foo"
foo = Key(name=k)

d = {}
d[foo] = 1

print(d[k])  # Key Error

The two hash functions are the same:

print(hash(k) == hash(foo))  # True

So I don’t understand why this doesn’t work.

Asked By: Tom McLean

||

Answers:

Two objects having different hashes guarantees that they’re different, but two objects having the same hash doesn’t in itself guarantee that they’re the same (because hash collisions exist). If you want the Key to be considered equal to a corresponding str, implement that in __eq__:

    def __eq__(self, other):
        if isinstance(other, Key):
            return self.name == other.name
        if isinstance(other, str):
            return self.name == other
        return False

This fixes the KeyError you’re encountering.

Answered By: Samwise

Adding my notes here from the comments on the answer above, as no one looks at those in any case, so those are likely to get swept under the rug at some point.

  • PyCharm also produces a helpful warning:

    ‘eq’ is ignored if the class already defines ‘__eq__‘ method.

    I think this means to remove the eq=True usage as well, from the @dataclass(...) decorator.

  • technically, you could also remove the last if isinstance(..., str): as well as the last return statement. I’m not entirely sure what would be the implications of that, however.

Here then, is a slightly more optimized approach (timings with timeit module below):

class Key:
    name: str

    def __hash__(self):
        return hash(self.name)

    def __eq__(self, other):
        return self.name == getattr(other, 'name', other)

Timings with timeit

from dataclasses import dataclass
from timeit import timeit


@dataclass(frozen=True)
class Key:
    name: str

    def __hash__(self):
        return hash(self.name)

    def __eq__(self, other):
        if isinstance(other, Key):
            return self.name == other.name
        if isinstance(other, str):
            return self.name == other
        return False


class KeyTwo(Key):
    def __eq__(self, other):
        return self.name == getattr(other, 'name', other)


k = "foo"
foo = Key(name=k)
foo_two = KeyTwo(name=k)

print('__eq__() Timings --')
print('isinstance():  ',  timeit("foo == k", globals=globals()))
print('getattr():     ', timeit("foo_two == k", globals=globals()))

assert foo == foo_two == k

Results on my M1 Mac:

__eq__() Timings --
isinstance():   0.10553250007797033
getattr():      0.08371329202782363
Answered By: rv.kvetch