Can I count on order being preserved in a Python tuple?

Question:

I’ve got a list of datetimes from which I want to construct time segments. In other words, turn [t0, t1, ... tn] into [(t0,t1),(t1,t2),...,(tn-1, tn)]. I’ve done it this way:

# start by sorting list of datetimes
mdtimes.sort()
# construct tuples which represent possible start and end dates

# left edges
dtg0 = [x for x in mdtimes]
dtg0.pop()

# right edges
dtg1 = [x for x in mdtimes]
dtg1.reverse()
dtg1.pop()
dtg1.sort()

dtsegs = zip(dtg0,dtg1)

Questions…

  1. Can I count on tn-1 < tn for any (tn-1,tn) after I’ve created them this way? (Is ordering preserved?)
  2. Is it good practice to copy the original mdtimes list with list comprehensions? If not how should it be done?
  3. The purpose for constructing these tuples is to iterate over them and segment a data set with tn-1 and tn. Is this a reasonable approach? i.e.

    datasegment = [x for x in bigdata if ( (x['datetime'] > tleft) and (x['datetime'] < tright))] 
    

Thanks

Asked By: Pete

||

Answers:

Instead of: dtg0 = [x for x in mdtimes], dtg0 = mdtimes[:] would do, since you just copy one list into another. Note: starting with Python 3.3, you can just say newlist = oldlist.copy()

As for order, zip‘s order is well defined, and both lists and tuples are ordered collections, so you should have no problem here.

Answered By: Eli Bendersky
  1. Tuple order is as you insert values into the tuple. They’re not going to be sorted as I think you’re asking. zip will again, retain the order you inserted the values in.

  2. It’s an acceptable method, but I have 2 alternate suggestions: Use the copy module, or use dtg1 = mdtimes[:].

  3. Sounds reasonable.

Answered By: moinudin

You can achieve the same with zip:

>>> l = ["t0", "t1", "t2", "t3", "t4", "t5", "t6"]
>>> zip(l[::2], l[1::2])
[('t0', 't1'), ('t2', 't3'), ('t4', 't5')]
Answered By: Paweł Nadolski

Both list and tuple are ordered.

dtg0, dtg1 = itertools.tee(mdtimes)
next(dtg0)
dtsegs = zip(dtg0, dtg1)

I’m no expert, but aren’t you quadrupling your memory requirements by copying the list and then making a new list of pairs taken from two lists? Why not just do the following:

dtsegs = [(dtg0[i], dtg0[i+1]) for i in range(len(dtg0)-1)]

Dunno how “Pythonic” that is, though.

EDIT: Actually, looking at what you need to do with this list of tuples, you could just do this [i] and [i+1] stuff directly at that level and not even create this new structure at all. I don’t know how many dates you’re dealing with, though – if it’s some small number I suppose it doesn’t really matter.

For what it’s worth, a couple of the other answerers here seem to be misunderstanding your question, though I can’t comment on their posts since I don’t have enough reputation yet 🙂 Ignacio Vazquez-Abrams’s solution seems the best to me, though his “next(dtg0)” should probably be “next(dtg1)” (?)

Answered By: flamingspinach

Turning (x1, x2, x3, …) into [(x1, x2), (x2, x3), …] is called a pairwise combination, and it’s so common a pattern that the itertools documentation provides a recipe:

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

for ta, tb in pairwise(mdtimes): 
    ....
Answered By: tokland

This is an answer to the question “Is this a reasonable approach?” (which appears to have been ignored by all).

Summary: You may want/need to lift your gaze from making a pairwise thingy out of mdtimes to the encompassing problem (segmenting bigdata).

Detail:

The desired use of the result is expressed as:

datasegment = [x for x in bigdata if ( (x['datetime'] > tleft) and (x['datetime'] < tright))] 

which is better expressed as:

datasegment = [x for x in bigdata if tleft < x['datetime'] < tright] 

Note that as that stands, it will not include any cases where the timestamp is exactly equal to one of the boundary points, so let’s change it to:

datasegment = [x for x in bigdata if tleft <= x['datetime'] < tright]

But that’s going to appear in a loop:

for tleft, tright in dtsegs:
    datasegment = [x for x in bigdata if tleft <= x['datetime'] < tright]
    do_something_with(datasegment)

Whoops! That’s going to take time proportional to len(bigdata) * len(dtsegs) … what are likely values of len(bigdata) and len(dtsegs)?

If bigdata is sorted, what you want to do can be done in time proportional to N, where N = len(bigdata). If bigdata is not already sorted, it can be sorted in time proportional to N * log(N).

You might like to ask another question …

It’s also worth pointing out that any items in bigdata that have a timestamp < min(mdtimes) or >= max(mdtimes) will not be included in any data segment … is this intentional?

Answered By: John Machin
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.