# Set "in" operator: uses equality or identity?

## Question:

``````class A(object):
def __cmp__(self):
print '__cmp__'
return object.__cmp__(self)

def __eq__(self, rhs):
print '__eq__'
return True
a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])
``````

Why does first line prints True, but second prints False? And neither enters operator eq?

I am using Python 2.6

You need to define `__hash__` too. For example

``````class A(object):
def __hash__(self):
print '__hash__'
return 42

def __cmp__(self, other):
print '__cmp__'
return object.__cmp__(self, other)

def __eq__(self, rhs):
print '__eq__'
return True

a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])
``````

Will work as expected.

As a general rule, any time you implement `__cmp__` you should implement a `__hash__` such that for all `x` and `y` such that `x == y`, `x.__hash__() == y.__hash__()`.

Sets seem to use hash codes, then identity, before comparing for equality. The following code:

``````class A(object):
def __eq__(self, rhs):
print '__eq__'
return True
def __hash__(self):
print '__hash__'
return 1

a1 = A()
a2 = A()

print 'set1'
set1 = set([a1])

print 'set2'
set2 = set([a2])

print 'a1 in set1'
print a1 in set1

print 'a1 in set2'
print a1 in set2
``````

outputs:

``````set1
__hash__
set2
__hash__
a1 in set1
__hash__
True
a1 in set2
__hash__
__eq__
True
``````

What happens seems to be:

1. The hash code is computed when an element is inserted into a hash. (To compare with the existing elements.)
2. The hash code for the object you’re checking with the `in` operator is computed.
3. Elements of the set with the same hash code are inspected by first checking whether they’re the same object as the one you’re looking for, or if they’re logically equal to it.

Sets and dictionaries gain their speed by using hashing as a fast approximation of full equality checking. If you want to redefine equality, you usually need to redefine the hash algorithm so that it is consistent.

The default hash function uses the identity of the object, which is pretty useless as a fast approximation of full equality, but at least allows you to use an arbitrary class instance as a dictionary key and retrieve the value stored with it if you pass exactly the same object as a key. But it means if you redefine equality and don’t redefine the hash function, your objects will go into a dictionary/set without complaining about not being hashable, but still won’t actually work the way you expect them to.

See the official python docs on `__hash__` for more details.

A tangential answer, but your question and my testing made me curious. If you ignore the set operator which is the source of your `__hash__` problem, it turns out your question is still interesting.

Thanks to the help I got on this SO question, I was able to chase the in operator through the source code to it’s root. Near the bottom I found the PyObject_RichCompareBool function which indeed tests for identity (see the comment about “Quick result”) before testing for equality.

So unless I misunderstand the way things work, the technical answer to your question is first identity and then equality, through the equality test itself. Just to reiterate, that is not the source of the behavior you were seeing but just the technical answer to your question.

If I misunderstood the source, somebody please set me straight.

``````int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
PyObject *res;
int ok;

/* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}

res = PyObject_RichCompare(v, w, op);
if (res == NULL)
return -1;
if (PyBool_Check(res))
ok = (res == Py_True);
else
ok = PyObject_IsTrue(res);
Py_DECREF(res);
return ok;
}
``````

Set __contains__ makes checks in the following order:

`````` 'Match' if hash(a) == hash(b) and (a is b or a==b) else 'No Match'
``````

The relevant C source code is in Objects/setobject.c::set_lookkey() and in Objects/object.c::PyObject_RichCompareBool().

Categories: questions
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.