Merging a list of time-range tuples that have overlapping time-ranges

Question:

I have a list of tuples where each tuple is a (start-time, end-time). I am trying to merge all overlapping time ranges and return a list of distinct time ranges.
For example

[(1, 5), (2, 4), (3, 6)] --->  [(1,6)]
[(1, 3), (2, 4), (5, 8)] --->  [(1, 4), (5,8)]

Here is how I implemented it.

# Algorithm
# initialranges: [(a,b), (c,d), (e,f), ...]
# First we sort each tuple then whole list.
# This will ensure that a<b, c<d, e<f ... and a < c < e ... 
# BUT the order of b, d, f ... is still random
# Now we have only 3 possibilities
#================================================
# b<c<d: a-------b           Ans: [(a,b),(c,d)]
#                  c---d
# c<=b<d: a-------b          Ans: [(a,d)]
#               c---d
# c<d<b: a-------b           Ans: [(a,b)]
#         c---d
#================================================
def mergeoverlapping(initialranges):
    i = sorted(set([tuple(sorted(x)) for x in initialranges]))

    # initialize final ranges to [(a,b)]
    f = [i[0]]
    for c, d in i[1:]:
        a, b = f[-1]
        if c<=b<d:
            f[-1] = a, d
        elif b<c<d:
            f.append((c,d))
        else:
            # else case included for clarity. Since 
            # we already sorted the tuples and the list
            # only remaining possibility is c<d<b
            # in which case we can silently pass
            pass
    return f

I am trying to figure out if

  1. Is the a an built-in function in some python module that can do this more efficiently? or
  2. Is there a more pythonic way of accomplishing the same goal?

Your help is appreciated. Thanks!

Asked By: Praveen Gollakota

||

Answers:

Sort tuples then list, if t1.right>=t2.left => merge
and restart with the new list, …

–>

def f(l, sort = True):
    if sort:
        sl = sorted(tuple(sorted(i)) for i in l)
    else:
        sl = l
    if len(sl) > 1:
        if sl[0][1] >= sl[1][0]:
            sl[0] = (sl[0][0], sl[1][1])
            del sl[1]
            if len(sl) < len(l):
                return f(sl, False)
    return sl
Answered By: manji

A few ways to make it more efficient, Pythonic:

  1. Eliminate the set() construction, since the algorithm should prune out duplicates during in the main loop.
  2. If you just need to iterate over the results, use yield to generate the values.
  3. Reduce construction of intermediate objects, for example: move the tuple() call to the point where the final values are produced, saving you from having to construct and throw away extra tuples, and reuse a list saved for storing the current time range for comparison.

Code:

def merge(times):
    saved = list(times[0])
    for st, en in sorted([sorted(t) for t in times]):
        if st <= saved[1]:
            saved[1] = max(saved[1], en)
        else:
            yield tuple(saved)
            saved[0] = st
            saved[1] = en
    yield tuple(saved)

data = [
    [(1, 5), (2, 4), (3, 6)],
    [(1, 3), (2, 4), (5, 8)]
    ]

for times in data:
    print list(merge(times))
Answered By: samplebias

The sort part: use standard sorting, it compares tuples the right way already.

sorted_tuples = sorted(initial_ranges)

The merge part. It eliminates duplicate ranges, too, so no need for a set. Suppose you have current_tuple and next_tuple.

c_start, c_end = current_tuple
n_start, n_end = next_tuple
if n_start <= c_end: 
  merged_tuple = min(c_start, n_start), max(c_end, n_end)

I hope the logic is clear enough.

To peek next tuple, you can use indexed access to sorted tuples; it’s a wholly known sequence anyway.

Answered By: 9000

Sort all boundaries then take all pairs where a boundary end is followed by a boundary start.

def mergeOverlapping(initialranges):
    def allBoundaries():
        for r in initialranges:
            yield r[0], True
            yield r[1], False

    def getBoundaries(boundaries):
        yield boundaries[0][0]
        for i in range(1, len(boundaries) - 1):
            if not boundaries[i][1] and boundaries[i + 1][1]:
                yield boundaries[i][0]
                yield boundaries[i + 1][0]
        yield boundaries[-1][0]

    return getBoundaries(sorted(allBoundaries()))

Hm, not that beautiful but was fun to write at least!

EDIT: Years later, after an upvote, I realised my code was wrong! This is the new version just for fun:

def mergeOverlapping(initialRanges):
    def allBoundaries():
        for r in initialRanges:
            yield r[0], -1
            yield r[1], 1

    def getBoundaries(boundaries):
        openrange = 0
        for value, boundary in boundaries:
            if not openrange:
                yield value
            openrange += boundary
            if not openrange:
                yield value

    def outputAsRanges(b):
        while b:
            yield (b.next(), b.next())

    return outputAsRanges(getBoundaries(sorted(allBoundaries())))

Basically I mark the boundaries with -1 or 1 and then sort them by value and only output them when the balance between open and closed braces is zero.

Answered By: UncleZeiv

Late, but might help someone looking for this. I had a similar problem but with dictionaries. Given a list of time ranges, I wanted to find overlaps and merge them when possible. A little modification to @samplebias answer led me to this:

Merge function:

def merge_range(ranges: list, start_key: str, end_key: str):
    ranges = sorted(ranges, key=lambda x: x[start_key])
    saved = dict(ranges[0])

    for range_set in sorted(ranges, key=lambda x: x[start_key]):
        if range_set[start_key] <= saved[end_key]:
            saved[end_key] = max(saved[end_key], range_set[end_key])
        else:
            yield dict(saved)
            saved[start_key] = range_set[start_key]
            saved[end_key] = range_set[end_key]
    yield dict(saved)

Data:

data = [
    {'start_time': '09:00:00', 'end_time': '11:30:00'},
    {'start_time': '15:00:00', 'end_time': '15:30:00'},
    {'start_time': '11:00:00', 'end_time': '14:30:00'},
    {'start_time': '09:30:00', 'end_time': '14:00:00'}
]

Execution:

print(list(merge_range(ranges=data, start_key='start_time', end_key='end_time')))

Output:

[
    {'start_time': '09:00:00', 'end_time': '14:30:00'},
    {'start_time': '15:00:00', 'end_time': '15:30:00'}
]
Answered By: Nirmal

When using Python 3.7, following the suggestion given by “RuntimeError: generator raised StopIteration” every time I try to run app, the method outputAsRanges from @UncleZeiv should be:

    def outputAsRanges(b):
        while b:
            try:
                yield (next(b), next(b))
            except StopIteration:
                return
Answered By: cesc
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.