Can I count on order being preserved in a Python tuple?
Question:
I’ve got a list of datetimes from which I want to construct time segments. In other words, turn [t0, t1, ... tn]
into [(t0,t1),(t1,t2),...,(tn-1, tn)]
. I’ve done it this way:
# start by sorting list of datetimes
mdtimes.sort()
# construct tuples which represent possible start and end dates
# left edges
dtg0 = [x for x in mdtimes]
dtg0.pop()
# right edges
dtg1 = [x for x in mdtimes]
dtg1.reverse()
dtg1.pop()
dtg1.sort()
dtsegs = zip(dtg0,dtg1)
Questions…
- Can I count on tn-1 < tn for any (tn-1,tn) after I’ve created them this way? (Is ordering preserved?)
- Is it good practice to copy the original
mdtimes
list with list comprehensions? If not how should it be done?
-
The purpose for constructing these tuples is to iterate over them and segment a data set with tn-1
and tn
. Is this a reasonable approach? i.e.
datasegment = [x for x in bigdata if ( (x['datetime'] > tleft) and (x['datetime'] < tright))]
Thanks
Answers:
Instead of: dtg0 = [x for x in mdtimes]
, dtg0 = mdtimes[:]
would do, since you just copy one list into another. Note: starting with Python 3.3, you can just say newlist = oldlist.copy()
As for order, zip
‘s order is well defined, and both lists and tuples are ordered collections, so you should have no problem here.
-
Tuple order is as you insert values into the tuple. They’re not going to be sorted as I think you’re asking. zip
will again, retain the order you inserted the values in.
-
It’s an acceptable method, but I have 2 alternate suggestions: Use the copy module, or use dtg1 = mdtimes[:]
.
-
Sounds reasonable.
You can achieve the same with zip
:
>>> l = ["t0", "t1", "t2", "t3", "t4", "t5", "t6"]
>>> zip(l[::2], l[1::2])
[('t0', 't1'), ('t2', 't3'), ('t4', 't5')]
Both list
and tuple
are ordered.
dtg0, dtg1 = itertools.tee(mdtimes)
next(dtg0)
dtsegs = zip(dtg0, dtg1)
I’m no expert, but aren’t you quadrupling your memory requirements by copying the list and then making a new list of pairs taken from two lists? Why not just do the following:
dtsegs = [(dtg0[i], dtg0[i+1]) for i in range(len(dtg0)-1)]
Dunno how “Pythonic” that is, though.
EDIT: Actually, looking at what you need to do with this list of tuples, you could just do this [i] and [i+1] stuff directly at that level and not even create this new structure at all. I don’t know how many dates you’re dealing with, though – if it’s some small number I suppose it doesn’t really matter.
For what it’s worth, a couple of the other answerers here seem to be misunderstanding your question, though I can’t comment on their posts since I don’t have enough reputation yet 🙂 Ignacio Vazquez-Abrams’s solution seems the best to me, though his “next(dtg0)” should probably be “next(dtg1)” (?)
Turning (x1, x2, x3, …) into [(x1, x2), (x2, x3), …] is called a pairwise combination, and it’s so common a pattern that the itertools documentation provides a recipe:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
for ta, tb in pairwise(mdtimes):
....
This is an answer to the question “Is this a reasonable approach?” (which appears to have been ignored by all).
Summary: You may want/need to lift your gaze from making a pairwise thingy out of mdtimes
to the encompassing problem (segmenting bigdata
).
Detail:
The desired use of the result is expressed as:
datasegment = [x for x in bigdata if ( (x['datetime'] > tleft) and (x['datetime'] < tright))]
which is better expressed as:
datasegment = [x for x in bigdata if tleft < x['datetime'] < tright]
Note that as that stands, it will not include any cases where the timestamp is exactly equal to one of the boundary points, so let’s change it to:
datasegment = [x for x in bigdata if tleft <= x['datetime'] < tright]
But that’s going to appear in a loop:
for tleft, tright in dtsegs:
datasegment = [x for x in bigdata if tleft <= x['datetime'] < tright]
do_something_with(datasegment)
Whoops! That’s going to take time proportional to len(bigdata) * len(dtsegs)
… what are likely values of len(bigdata)
and len(dtsegs)
?
If bigdata
is sorted, what you want to do can be done in time proportional to N
, where N = len(bigdata)
. If bigdata
is not already sorted, it can be sorted in time proportional to N * log(N)
.
You might like to ask another question …
It’s also worth pointing out that any items in bigdata
that have a timestamp < min(mdtimes) or >= max(mdtimes) will not be included in any data segment … is this intentional?
I’ve got a list of datetimes from which I want to construct time segments. In other words, turn [t0, t1, ... tn]
into [(t0,t1),(t1,t2),...,(tn-1, tn)]
. I’ve done it this way:
# start by sorting list of datetimes
mdtimes.sort()
# construct tuples which represent possible start and end dates
# left edges
dtg0 = [x for x in mdtimes]
dtg0.pop()
# right edges
dtg1 = [x for x in mdtimes]
dtg1.reverse()
dtg1.pop()
dtg1.sort()
dtsegs = zip(dtg0,dtg1)
Questions…
- Can I count on tn-1 < tn for any (tn-1,tn) after I’ve created them this way? (Is ordering preserved?)
- Is it good practice to copy the original
mdtimes
list with list comprehensions? If not how should it be done? -
The purpose for constructing these tuples is to iterate over them and segment a data set with
tn-1
andtn
. Is this a reasonable approach? i.e.datasegment = [x for x in bigdata if ( (x['datetime'] > tleft) and (x['datetime'] < tright))]
Thanks
Instead of: dtg0 = [x for x in mdtimes]
, dtg0 = mdtimes[:]
would do, since you just copy one list into another. Note: starting with Python 3.3, you can just say newlist = oldlist.copy()
As for order, zip
‘s order is well defined, and both lists and tuples are ordered collections, so you should have no problem here.
-
Tuple order is as you insert values into the tuple. They’re not going to be sorted as I think you’re asking.
zip
will again, retain the order you inserted the values in. -
It’s an acceptable method, but I have 2 alternate suggestions: Use the copy module, or use
dtg1 = mdtimes[:]
. -
Sounds reasonable.
You can achieve the same with zip
:
>>> l = ["t0", "t1", "t2", "t3", "t4", "t5", "t6"]
>>> zip(l[::2], l[1::2])
[('t0', 't1'), ('t2', 't3'), ('t4', 't5')]
Both list
and tuple
are ordered.
dtg0, dtg1 = itertools.tee(mdtimes)
next(dtg0)
dtsegs = zip(dtg0, dtg1)
I’m no expert, but aren’t you quadrupling your memory requirements by copying the list and then making a new list of pairs taken from two lists? Why not just do the following:
dtsegs = [(dtg0[i], dtg0[i+1]) for i in range(len(dtg0)-1)]
Dunno how “Pythonic” that is, though.
EDIT: Actually, looking at what you need to do with this list of tuples, you could just do this [i] and [i+1] stuff directly at that level and not even create this new structure at all. I don’t know how many dates you’re dealing with, though – if it’s some small number I suppose it doesn’t really matter.
For what it’s worth, a couple of the other answerers here seem to be misunderstanding your question, though I can’t comment on their posts since I don’t have enough reputation yet 🙂 Ignacio Vazquez-Abrams’s solution seems the best to me, though his “next(dtg0)” should probably be “next(dtg1)” (?)
Turning (x1, x2, x3, …) into [(x1, x2), (x2, x3), …] is called a pairwise combination, and it’s so common a pattern that the itertools documentation provides a recipe:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
for ta, tb in pairwise(mdtimes):
....
This is an answer to the question “Is this a reasonable approach?” (which appears to have been ignored by all).
Summary: You may want/need to lift your gaze from making a pairwise thingy out of mdtimes
to the encompassing problem (segmenting bigdata
).
Detail:
The desired use of the result is expressed as:
datasegment = [x for x in bigdata if ( (x['datetime'] > tleft) and (x['datetime'] < tright))]
which is better expressed as:
datasegment = [x for x in bigdata if tleft < x['datetime'] < tright]
Note that as that stands, it will not include any cases where the timestamp is exactly equal to one of the boundary points, so let’s change it to:
datasegment = [x for x in bigdata if tleft <= x['datetime'] < tright]
But that’s going to appear in a loop:
for tleft, tright in dtsegs:
datasegment = [x for x in bigdata if tleft <= x['datetime'] < tright]
do_something_with(datasegment)
Whoops! That’s going to take time proportional to len(bigdata) * len(dtsegs)
… what are likely values of len(bigdata)
and len(dtsegs)
?
If bigdata
is sorted, what you want to do can be done in time proportional to N
, where N = len(bigdata)
. If bigdata
is not already sorted, it can be sorted in time proportional to N * log(N)
.
You might like to ask another question …
It’s also worth pointing out that any items in bigdata
that have a timestamp < min(mdtimes) or >= max(mdtimes) will not be included in any data segment … is this intentional?