Is python's "set" stable?

Question:

The question arose when answering to another SO question (there).

When I iterate several times over a python set (without changing it between calls), can I assume it will always return elements in the same order? And if not, what is the rationale of changing the order ? Is it deterministic, or random? Or implementation defined?

And when I call the same python program repeatedly (not random, not input dependent), will I get the same ordering for sets?

The underlying question is if python set iteration order only depends on the algorithm used to implement sets, or also on the execution context?

Asked By: kriss

||

Answers:

The definition of a set is unordered, unique elements (“Unordered collections of unique elements”). You should care only about the interface, not the implementation. If you want an ordered enumeration, you should probably put it into a list and sort it.

There are many different implementations of Python. Don’t rely on undocumented behaviour, as your code could break on different Python implementations.

Answered By: Joe

There’s no formal guarantee about the stability of sets. However, in the CPython implementation, as long as nothing changes the set, the items will be produced in the same order. Sets are implemented as open-addressing hashtables (with a prime probe), so inserting or removing items can completely change the order (in particular, when that triggers a resize, which reorganizes how the items are laid out in memory.) You can also have two identical sets that nonetheless produce the items in different order, for example:

>>> s1 = {-1, -2}
>>> s2 = {-2, -1}
>>> s1 == s2
True
>>> list(s1), list(s2)
([-1, -2], [-2, -1])

Unless you’re very certain you have the same set and nothing touched it inbetween the two iterations, it’s best not to rely on it staying the same. Making seemingly irrelevant changes to, say, functions you call inbetween could produce very hard to find bugs.

Answered By: Thomas Wouters

It’s definitely implementation defined. The specification of a set says only that

Being an unordered collection, sets do not record element position or order of insertion.

Why not use OrderedDict to create your own OrderedSet class?

Answered By: Josh Lee

As pointed out, this is strictly an implementation detail.

But as long as you don’t change the structure between calls, there should be no reason for a read-only operation (= iteration) to change with time: no sane implementation does that. Even randomized (= non-deterministic) data structures that can be used to implement sets (e.g. skip lists) don’t change the reading order when no changes occur.

So, being rational, you can safely rely on this behaviour.

(I’m aware that certain GCs may reorder memory in a background thread but even this reordering will not be noticeable on the level of data structures, unless a bug occurs.)

Answered By: Konrad Rudolph

And when I call the same python
program repeatedly (not random, not
input dependent), will I get the same
ordering for sets?

I can answer this part of the question now after a quick experiment. Using the following code:

class Foo(object) :
  def __init__(self,val) :
    self.val = val
  def __repr__(self) :
    return str(self.val)

x = set()
for y in range(500) :
  x.add(Foo(y))
print list(x)[-10:]

I can trigger the behaviour that I was asking about in the other question. If I run this repeatedly then the output changes, but not on every run. It seems to be “weakly random” in that it changes slowly. This is certainly implementation dependent so I should say that I’m running the macports Python2.6 on snow-leopard. While the program will output the same answer for long runs of time, doing something that affects the system entropy pool (writing to the disk mostly works) will somethimes kick it into a different output.

The class Foo is just a simple int wrapper as experiments show that this doesn’t happen with sets of ints. I think that the problem is caused by the lack of __eq__ and __hash__ members for the object, although I would dearly love to know the underlying explanation / ways to avoid it. Also useful would be some way to reproduce / repeat a “bad” run. Does anyone know what seed it uses, or how I could set that seed?

Answered By: Andrew

A set or frozenset is inherently an unordered collection. Internally, sets are based on a hash table, and the order of keys depends both on the insertion order and on the hash algorithm. In CPython (aka standard Python) integers less than the machine word size (32 bit or 64 bit) hash to themself, but text strings, bytes strings, and datetime objects hash to integers that vary randomly; you can control that by setting the PYTHONHASHSEED environment variable.

From the __hash__ docs:

Note

By default, the __hash__() values of str, bytes and datetime
objects are “salted” with an unpredictable random value. Although they
remain constant within an individual Python process, they are not
predictable between repeated invocations of Python.

This is intended to provide protection against a denial-of-service
caused by carefully-chosen inputs that exploit the worst case
performance of a dict insertion, O(n^2) complexity. See
http://www.ocert.org/advisories/ocert-2011-003.html for details.

Changing hash values affects the iteration order of dicts, sets and
other mappings. Python has never made guarantees about this ordering
(and it typically varies between 32-bit and 64-bit builds).

See also PYTHONHASHSEED.

The results of hashing objects of other classes depend on the details of the class’s __hash__ method.

The upshot of all this is that you can have two sets containing identical strings but when you convert them to lists they can compare unequal. Or they may not. 😉 Here’s some code that demonstrates this. On some runs, it will just loop, not printing anything, but on other runs it will quickly find a set that uses a different order to the original.

from random import seed, shuffle

seed(42)

data = list('abcdefgh')
a = frozenset(data)
la = list(a)
print(''.join(la), a)

while True:
    shuffle(data)
    lb = list(frozenset(data))
    if lb != la:
        print(''.join(data), ''.join(lb))
        break    

typical output

dachbgef frozenset({'d', 'a', 'c', 'h', 'b', 'g', 'e', 'f'})
deghcfab dahcbgef
Answered By: PM 2Ring

The answer is simply a NO.

Python set operation is NOT stable.

I did a simple experiment to show this.

The code:

import random
random.seed(1)

x=[]
class aaa(object):
    def __init__(self,a,b):
        self.a=a
        self.b=b

for i in range(5):
    x.append(aaa(random.choice('asf'),random.randint(1,4000)))

for j in x:
    print(j.a,j.b)

print('====')
for j in set(x):
    print(j.a,j.b)

Run this for twice, you will get this:

First time result:

a 2332
a 1045
a 2030
s 1935
f 1555
====
a 2030
a 2332
f 1555
a 1045
s 1935

Process finished with exit code 0

Second time result:

a 2332
a 1045
a 2030
s 1935
f 1555
====
s 1935
a 2332
a 1045
f 1555
a 2030

Process finished with exit code 0

The reason is explained in comments in this answer.

However, there are some ways to make it stable:

Answered By: Statham
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.