List unhashable, but tuple hashable?
Question:
In How to hash lists? I was told that I should convert to a tuple first, e.g. [1,2,3,4,5]
to (1,2,3,4,5)
.
So the first cannot be hashed, but the second can. Why*?
*I am not really looking for a detailed technical explanation, but rather for an intuition
Answers:
Since a list is mutable, if you modify it you would modify its hash too, which ruins the point of having a hash (like in a set or a dict key).
Edit: I’m surprised this answer regularly get new upvotes, it was really quickly written. I feel I need to make it better now.
So the set and the dict native data structures are implemented with a hashmap. Data types in Python may have a magic method __hash__() that will be used in hashmap construction and lookups.
Only immutable data types (int, string, tuple, …) have this method, and the hash value is based on the data and not the identity of the object.
You can check this by
>>> a = (0,1)
>>> b = (0,1)
>>> a is b
False # Different objects
>>> hash(a) == hash(b)
True # Same hash
If we follow this logic, mutating the data would mutate the hash, but then what’s the point of a changing hash ? It defeats the whole purpose of sets and dicts or other hashes usages.
Fun fact : if you try the example with strings or ints -5 <= i <= 256, a is b
returns True because of micro-optimizations (in CPython at least).
Because lists are mutable and tuples aren’t.
Mainly, because tuples are immutable. Assume the following works:
>>> l = [1, 2, 3]
>>> t = (1, 2, 3)
>>> x = {l: 'a list', t: 'a tuple'}
Now, what happens when you do l.append(4)
? You’ve modified the key in your dictionary! From afar! If you’re familiar with how hashing algorithms work, this should frighten you. Tuples, on the other hand, are absolutely immutable. t += (1,)
might look like it’s modifying the tuple, but really it’s not: it simply creating a new tuple, leaving your dictionary key unchanged.
You could totally make that work, but I bet you wouldn’t like the effects.
from functools import reduce
from operator import xor
class List(list):
def __hash__(self):
return reduce(xor, self)
Now let’s see what happens:
>>> l = List([23,42,99])
>>> hash(l)
94
>>> d = {l: "Hello"}
>>> d[l]
'Hello'
>>> l.append(7)
>>> d
{[23, 42, 99, 7]: 'Hello'}
>>> l
[23, 42, 99, 7]
>>> d[l]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [23, 42, 99, 7]
edit: So I thought about this some more. You could make the above example work, if you return the list’s id as its hash value:
class List(list):
def __hash__(self):
return id(self)
In that case, d[l]
will give you 'Hello'
, but neither d[[23,42,99,7]]
nor d[List([23,42,99,7])]
will (because you’re creating a new [Ll]ist
.
Not every tuple is hashable.For example tuple contains list as an element.
x = (1,[2,3])
print(type(x))
print(hash(x))
The answers are good. The reason is the mutability. If we could use list in dicts as keys; (or any mutable object) then we would be able to change the key by mutating that key (either accidentally or intentionally). This would cause change in the hash value of the key in dictionary due to which we would not be able to retrace the value from that data structure by that key.
Hash values and Hash tables are used to map the large data with ease by mapping them to indices which stores the real value entries.
Read more about them here:-
In How to hash lists? I was told that I should convert to a tuple first, e.g. [1,2,3,4,5]
to (1,2,3,4,5)
.
So the first cannot be hashed, but the second can. Why*?
*I am not really looking for a detailed technical explanation, but rather for an intuition
Since a list is mutable, if you modify it you would modify its hash too, which ruins the point of having a hash (like in a set or a dict key).
Edit: I’m surprised this answer regularly get new upvotes, it was really quickly written. I feel I need to make it better now.
So the set and the dict native data structures are implemented with a hashmap. Data types in Python may have a magic method __hash__() that will be used in hashmap construction and lookups.
Only immutable data types (int, string, tuple, …) have this method, and the hash value is based on the data and not the identity of the object.
You can check this by
>>> a = (0,1)
>>> b = (0,1)
>>> a is b
False # Different objects
>>> hash(a) == hash(b)
True # Same hash
If we follow this logic, mutating the data would mutate the hash, but then what’s the point of a changing hash ? It defeats the whole purpose of sets and dicts or other hashes usages.
Fun fact : if you try the example with strings or ints -5 <= i <= 256, a is b
returns True because of micro-optimizations (in CPython at least).
Because lists are mutable and tuples aren’t.
Mainly, because tuples are immutable. Assume the following works:
>>> l = [1, 2, 3]
>>> t = (1, 2, 3)
>>> x = {l: 'a list', t: 'a tuple'}
Now, what happens when you do l.append(4)
? You’ve modified the key in your dictionary! From afar! If you’re familiar with how hashing algorithms work, this should frighten you. Tuples, on the other hand, are absolutely immutable. t += (1,)
might look like it’s modifying the tuple, but really it’s not: it simply creating a new tuple, leaving your dictionary key unchanged.
You could totally make that work, but I bet you wouldn’t like the effects.
from functools import reduce
from operator import xor
class List(list):
def __hash__(self):
return reduce(xor, self)
Now let’s see what happens:
>>> l = List([23,42,99])
>>> hash(l)
94
>>> d = {l: "Hello"}
>>> d[l]
'Hello'
>>> l.append(7)
>>> d
{[23, 42, 99, 7]: 'Hello'}
>>> l
[23, 42, 99, 7]
>>> d[l]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [23, 42, 99, 7]
edit: So I thought about this some more. You could make the above example work, if you return the list’s id as its hash value:
class List(list):
def __hash__(self):
return id(self)
In that case, d[l]
will give you 'Hello'
, but neither d[[23,42,99,7]]
nor d[List([23,42,99,7])]
will (because you’re creating a new [Ll]ist
.
Not every tuple is hashable.For example tuple contains list as an element.
x = (1,[2,3])
print(type(x))
print(hash(x))
The answers are good. The reason is the mutability. If we could use list in dicts as keys; (or any mutable object) then we would be able to change the key by mutating that key (either accidentally or intentionally). This would cause change in the hash value of the key in dictionary due to which we would not be able to retrace the value from that data structure by that key.
Hash values and Hash tables are used to map the large data with ease by mapping them to indices which stores the real value entries.
Read more about them here:-