Right way to initialize an OrderedDict using its constructor such that it retains order of initial data?

Question:

What’s the correct way to initialize an ordered dictionary (OD) so that it retains the order of initial data?

from collections import OrderedDict

# Obviously wrong because regular dict loses order
d = OrderedDict({'b':2, 'a':1}) 

# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b',2), ('a', 1)])

# What about using a list comprehension, will 'd' preserve the order of 'l'
l = ['b', 'a', 'c', 'aa']
d = OrderedDict([(i,i) for i in l])

Question:

  • Will an OrderedDict preserve the order of a list of tuples, or tuple of tuples or tuple of lists or list of lists etc. passed at the time of initialization (2nd & 3rd example above)?

  • How does one go about verifying if OrderedDict actually maintains an order? Since a dict has an unpredictable order, what if my test vectors luckily have the same initial order as the unpredictable order of a dict? For example, if instead of d = OrderedDict({'b':2, 'a':1}) I write d = OrderedDict({'a':1, 'b':2}), I can wrongly conclude that the order is preserved. In this case, I found out that a dict is ordered alphabetically, but that may not be always true. What’s a reliable way to use a counterexample to verify whether a data structure preserves order or not, short of trying test vectors repeatedly until one breaks?

P.S. I’ll just leave this here for reference: “The OrderedDict constructor and update() method both accept keyword arguments, but their order is lost because Python’s function call semantics pass-in keyword arguments using a regular unordered dictionary”

P.P.S : Hopefully, in future, OrderedDict will preserve the order of kwargs also (example 1): http://bugs.python.org/issue16991

Asked By: click

||

Answers:

# An OD is represented by a list of tuples, so would this work?
d = OrderedDict([('b', 2), ('a', 1)])

Yes, that will work. By definition, a list is always ordered the way it is represented. This goes for list-comprehension too, the list generated is in the same way the data was provided (i.e. source from a list it will be deterministic, sourced from a set or dict not so much).

How does one go about verifying if OrderedDict actually maintains an order. Since a dict has an unpredictable order, what if my test vectors luckily has the same initial order as the unpredictable order of a dict?. For example, if instead of d = OrderedDict({'b':2, 'a':1}) I write d = OrderedDict({'a':1, 'b':2}), I can wrongly conclude that the order is preserved. In this case, I found out that a dict is order alphabetically, but that may not be always true. i.e. what’s a reliable way to use a counter example to verify if a data structure preserves order or not short of trying test vectors repeatedly until one breaks.

You keep your source list of 2-tuple around for reference, and use that as your test data for your test cases when you do unit tests. Iterate through them and ensure the order is maintained.

Answered By: metatoaster

The OrderedDict will preserve any order that it has access to. The only way to pass ordered data to it to initialize is to pass a list (or, more generally, an iterable) of key-value pairs, as in your last two examples. As the documentation you linked to says, the OrderedDict does not have access to any order when you pass in keyword arguments or a dict argument, since any order there is removed before the OrderedDict constructor sees it.

Note that using a list comprehension in your last example doesn’t change anything. There’s no difference between OrderedDict([(i,i) for i in l]) and OrderedDict([('b', 'b'), ('a', 'a'), ('c', 'c'), ('aa', 'aa')]). The list comprehension is evaluated and creates the list and it is passed in; OrderedDict knows nothing about how it was created.

Answered By: BrenBarn

It is also possible (and a little more efficient) to use a generator expression:

d = OrderedDict((i, i) for i in l)

Obviously, the benefit is negligible in this trivial case for l, but if l corresponds to an iterator or was yielding results from a generator, e.g. used to parse and iterate through a large file, then the difference could be very substantial (e.g. avoiding to load the entire contents onto memory). For example:

def mygen(filepath):
    with open(filepath, 'r') as f:
        for line in f:
            yield [int(field) for field line.split()]

d = OrderedDict((i, sum(numbers)) for i, numbers in enumerate(mygen(filepath)))
Answered By: Ataxias