Expand list of tuples

Question:

I have a big list of tuples (20-30 million entries). Each sub-tuple has three pieces of information (start, end, value) and is supposed to only represent one position in a sequence. However, the program generating this data has concatenated some of the values of adjacent positions if the value is the same. Thus they look a like this:

[..., (71, 72, -0.2998250126838684),
      (72, 73, -0.2858070135116577),
      (73, 74, -0.29049500823020935),
      (74, 75, -0.3044680058956146),
      (75, 76, -0.28386199474334717),
      (76, 80, -0.27730199694633484), # PROBLEM: end - start > 1
      (80, 81, -0.2726449966430664),
      (81, 82, -0.26151400804519653),
      (82, 84, -0.2679719924926758),  # PROBLEM: end - start > 1
      (84, 85, -0.24273300170898438),
      (85, 86, -0.23799900710582733),
      (86, 87, -0.24745100736618042),
      (87, 88, -0.2568419873714447),
      (88, 89, -0.2585819959640503), ...]

To fix this I would like to create new entries in the list which separate these tuples which represent multiple positions into new tuples which represent just one position.
I thus desire an output like this:

(..., (71, 72, -0.2998250126838684),
      (72, 73, -0.2858070135116577),
      (73, 74, -0.29049500823020935),
      (74, 75, -0.3044680058956146),
      (75, 76, -0.28386199474334717),
      (76, 77, -0.27730199694633484), # New
      (77, 78, -0.27730199694633484), # New
      (78, 79, -0.27730199694633484), # New
      (79, 80, -0.27730199694633484), # New
      (80, 81, -0.2726449966430664),
      (81, 82, -0.26151400804519653),
      (82, 83, -0.2679719924926758),  # New
      (83, 84, -0.2679719924926758),  # New
      (84, 85, -0.24273300170898438),
      (85, 86, -0.23799900710582733),
      (86, 87, -0.24745100736618042),
      (87, 88, -0.2568419873714447),
      (88, 89, -0.2585819959640503), ...)

To do this, with my list of tuples called bwList I have done the following:

replacementTuple = ()

for t in bwList:
    if t[1] - t[0] == 1:
        replacementTuple = replacementTuple + (t,)
    else:
        numNewTuples = t[1] - t[0]
        st, ed = range(t[0], t[1]), range(t[0] + 1, t[1] + 1)
        for m in range(numNewTuples):
            replacementTuple = replacementTuple + ((st[m], ed[m], t[2]) ,) 

Here the output is a tuple of tuples, not a list. I don’t mind too much either way.

This approach appears to work, but is very slow!

Is there a way to speed this up?

Asked By: G_T

||

Answers:

Use a list instead of a tuple for the result. Then you can append to it in place, rather than making a copy of the tuple every time you add to it, which makes your algorithm O(n^2).

You don’t need the if statement — the case where end = start+1 is just a limiting case, and the loop will handle it properly.

replacementList = []

for start, end, value in bwList:
    for i in range(start, end):
        replacementList.append((i, i+1, value))

You could also use a generator to construct the tuple.

replacementTuple = tuple((i, i+1, value) for start, end, value in bwList for i in range(start, end))
Answered By: Barmar

Dramatically simpler (and likely faster, mostly by virtue of amortized O(1) appends) would be to build a list that expands each tuple (possibly to the same tuple just once, but without special-casing):

replacement_list = [(newstart, newstart+1, value)
                    for start, end, value in bwList
                    for newstart in range(start, end)]

You can convert that to a tuple at the end if you like, but building it as a list first will save a lot of trouble.

Answered By: ShadowRanger
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.