Explaining appearent idiosyncrasy in Python when using tuples as elements in a set

Question:

If we execute the two following lines in Python:

a = (1,)
{a}

it won’t cause any problems. Instead, if we execute the two following lines:

a = ([1],)
{a}

now what we get is "TypeError: unhashable type: ‘list’".

In both cases, we’re building a set consisting of elements whose type is a tuple, which is immutable (in the first case (1,) and in the second case ([1],)). In the second case, however, our tuple contains an object of mutable type, i.e., the list [1].

It seems that the condition that the elements of a set must be immutable is not enough to guarantee that sets are successfully created. What is the exact condition that won’t lead to any error when building a set? What is happening at low level?

Asked By: Bob

||

Answers:

As the error message indicates, the actual criterion for whether something can be in a set is whether it is hashable, as stated in the set docs:

Set elements, like dictionary keys, must be hashable.

However, only immutable object should be hashable, as Python’s docs on __hash__() explain:

If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

In order for the hash to be representative of the entire contained data, immutable containers like tuples implement hashing by hashing each contained element in turn, with all the elements’ hashes then going into the computation of the container’s hash. Of course this is continued recursively if any elements are themselves immutable containers.

In your case, hashing the tuple will thus lead to an attempted hashing of the contained list, which fails as lists are not hashable:

>>> hash([1, 2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

As for why set elements must be hashable: It’s because that will allow membership checking to proceed in O(1) time (think hash map lookup). You could make your own set type without this requirement that would just do membership checking by going through the list of elements one by one, of course – that would effectively just be a list with an addition operation that refuses to insert duplicates.

Answered By: smheidrich