How to optimize splitting overlapping ranges?

Question:

This Python script I wrote to split overlapping ranges into unique ranges (last iteration). It produces correct output and outperforms the version in the answer. I tested output against correct method’s output and output of a brute force approach.

An infinite number of boxes arranged in a line are numbered. Each can only hold one object, whatever was last put into the box. They are initially empty. For a list of triplets, first two elements of each are integers (the first no greater than the second) and each triplet represents an instruction to put the third element into the boxes. Triplet (0, 10, 'A') means "put 'A' into boxes 0 to 10 (inclusive)" and after execution of the instruction, boxes 0 to 10 contain an instance of 'A'. Empty boxes are to be ignored. Describe the state of boxes after executing all instructions using the least number of triplets. There are narrow cases and general cases:

Narrow case: Given triplets (s1, e1, d1) and (s2, e2, d2), s1 < s2 < e1 < e2 is always False (all pairings in input conform to this). There are four sub cases:

  • Case 1: s1 = s2 and e2 < e1:

    Whichever ends first wins. Boxes always hold the value from the winner, given [(0, 10, 'A'), (0, 5, 'B')], after execution the state of the boxes is [(0, 5, 'B'), (6, 10, 'a')].

  • Case 2: s1 < s2 and e2 < e1:

    Whichever ends first wins, example: [(0, 10, 'A'), (5, 7, 'B')] -> [(0, 4, 'A'), (5, 7, 'B'), (8, 10, 'A')].

  • Case 3: s1 < s2 and e2 = e1:

    Same rule as above, but here it is a tie. In case of ties, whichever starts later wins: [(0, 10, 'A'), (6, 10, 'B')] -> [(0, 5, 'A'), (6, 10, 'B')].

  • Case 4: s1 = s2 and e1 = e2:

    This is special. In case of true ties, whichever comes later in the input wins. My code doesn’t guarantee objects in boxes are from latest instructions in cases like this (but it guarantees boxes have only one object).

Additional rules:

  • If there are no updates, do nothing.

    [(0, 10, 'A'), (5, 6, 'A')] -> [(0, 10, 'A')]

  • If there are gaps, leave them as is (those boxes are empty):

    [(0, 10, 'A'), (15, 20, 'B')] -> [(0, 10, 'A'), (15, 20, 'B')]

  • If e1 + 1 = s2 and d1 = d2, join them (least amount of triplets).

    [(0, 10, 'A'), (11, 20, 'A')] -> [(0, 20, 'A')]

  • The general case is when the primary condition of the narrow case doesn’t hold, in case of true intersections. Whichever starts later wins.

    [(0, 10, 'A'), (5, 20, 'B')] -> [(0, 4, 'A'), (5, 20, 'B')]

Brute force is correct but slow and can handle the general case:

def brute_force_discretize(ranges):
    numbers = {}
    ranges.sort(key=lambda x: (x[0], -x[1]))
    for start, end, data in ranges:
        numbers |= {n: data for n in range(start, end + 1)}
    numbers = list(numbers.items())
    l = len(numbers)
    i = 0
    output = []
    while i < l:
        di = 0
        curn, curv = numbers[i]
        while i < l and curn + di == numbers[i][0] and curv == numbers[i][1]:
            i += 1
            di += 1
        output.append((curn, numbers[i-1][0], curv))
    return output

Smart implementation is performant but only handles the narrow case:

from typing import Any, List, Tuple


def get_nodes(ranges: List[Tuple[int, int, Any]]) -> List[Tuple[int, int, Any]]:
    nodes = []
    for ini, fin, data in ranges:
        nodes.extend([(ini, False, data), (fin, True, data)])
    return sorted(nodes)


def merge_ranges(data: List[List[int | Any]], range: List[int | Any]) -> None:
    if not data or range[2] != (last := data[-1])[2] or range[0] > last[1] + 1:
        data.append(range)
    else:
        last[1] = range[1]


def discretize_narrow(ranges):
    nodes = get_nodes(ranges)
    output = []
    stack = []
    actions = []
    for node, end, data in nodes:
        if not end:
            action = False
            if not stack or data != stack[-1]:
                if stack and start < node:
                    merge_ranges(output, [start, node - 1, stack[-1]])
                stack.append(data)
                start = node
                action = True
            actions.append(action)
        elif actions.pop(-1):
            if start <= node:
                merge_ranges(output, [start, node, stack.pop(-1)])
                start = node + 1
            else:
                stack.pop(-1)
    return output

Full script that generates narrow cases. Performance:

In [518]: sample = make_sample(2048, 65536, 16)

In [519]: %timeit descretize(sample)
4.46 ms ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [520]: %timeit discretize_narrow(sample)
3.13 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [521]: list(map(tuple, discretize_narrow(sample))) == descretize(sample)
Out[521]: True

How can I make this faster? Inputs are sorted in ascending order but my code assumes it isn’t. If I split top level data into non-overlapping ranges (taking advantage of input being sorted), accumulate triplets to a stack while there are overlaps and push the stack to discretize and clear the stack when overlaps are over, then join intermediate results, the code can be faster. I can’t get this working.

And how can I handle the general case? I need to know when ranges end (four states) but can’t handle it correctly:

def get_quadruples(ranges):
    nodes = []
    for ini, fin, data in ranges:
        nodes.extend([(ini, False, -fin, data), (fin, True, ini, data)])
    return sorted(nodes)

I can’t post on Code Review because this is a "how" question (I haven’t achieved my goals). To demonstrate Interval Tree’s slowness, this answer on Code Review uses it:

In [531]: %timeit discretize_narrow([(0, 10, 'A'), (0, 1, 'B'), (2, 5, 'C'), (3, 4, 'C'), (6, 7, 'C'), (8, 8, 'D'), (110, 150, 'E'), (250, 300, 'C'), (256, 270, 'D'), (295, 300, 'E'), (500, 600, 'F')])
14.3 µs ± 42.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [532]: %timeit merge_rows([(0, 10, 'A'), (0, 1, 'B'), (2, 5, 'C'), (3, 4, 'C'), (6, 7, 'C'), (8, 8, 'D'), (110, 150, 'E'), (250, 300, 'C'), (256, 270, 'D'), (295, 300, 'E'), (500, 600, 'F')])
891 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [533]: data = [(0, 10, 'A'), (0, 1, 'B'), (2, 5, 'C'), (3, 4, 'C'), (6, 7, 'C'), (8, 8, 'D'), (110, 150, 'E'), (250, 300, 'C'), (256, 270, 'D'), (295, 300, 'E'), (500, 600, 'F')]

In [534]: merge_rows(data) == discretize_narrow(data)
Out[534]: True

In [535]: sample = make_sample(256, 65536, 16)

In [536]: merge_rows(sample) == discretize_narrow(sample)
Out[536]: True

In [537]: %timeit discretize_narrow(sample)
401 µs ± 3.33 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [538]: %time result = merge_rows(sample)
CPU times: total: 78.1 ms
Wall time: 56 ms

It is not efficient but I don’t think I can improve it. Test cases can be generated using code in the linked GitHub page. Logic is the same as dictionary update and brute force implementation uses dictionary update to process the ranges (it is by definition correct). Correctness of smart approach was verified using output of brute force.

Manual test cases:

In [539]: discretize_narrow([(0, 10, 'A'), (0, 1, 'B'), (2, 5, 'C'), (3, 4, 'C'), (6, 7, 'C'), (8, 8, 'D'), (110, 150, 'E'), (250, 300, 'C'), (256, 270, 'D'), (295, 300, 'E'), (500, 600, 'F')])
Out[539]:
[[0, 1, 'B'],
 [2, 7, 'C'],
 [8, 8, 'D'],
 [9, 10, 'A'],
 [110, 150, 'E'],
 [250, 255, 'C'],
 [256, 270, 'D'],
 [271, 294, 'C'],
 [295, 300, 'E'],
 [500, 600, 'F']]

In [540]: discretize_narrow([(0, 100, 'A'), (10, 25, 'B'), (15, 25, 'C'), (20, 25, 'D'), (30, 50, 'E'), (40, 50, 'F'), (60, 80, 'G'), (150, 180, 'H')])
Out[540]:
[[0, 9, 'A'],
 [10, 14, 'B'],
 [15, 19, 'C'],
 [20, 25, 'D'],
 [26, 29, 'A'],
 [30, 39, 'E'],
 [40, 50, 'F'],
 [51, 59, 'A'],
 [60, 80, 'G'],
 [81, 100, 'A'],
 [150, 180, 'H']]

Machine generated general case (and correct output):

In [542]: ranges = []

In [543]: for _ in range(20):
     ...:     start = random.randrange(100)
     ...:     end = random.randrange(100)
     ...:     if start > end:
     ...:         start, end = end, start
     ...:     ranges.append([start, end, random.randrange(5)])

In [544]: ranges.sort()

In [545]: ranges
Out[545]:
[[0, 31, 0],
 [1, 47, 1],
 [1, 67, 0],
 [10, 68, 0],
 [15, 17, 2],
 [18, 39, 0],
 [19, 73, 3],
 [25, 32, 0],
 [26, 33, 1],
 [26, 72, 2],
 [26, 80, 2],
 [28, 28, 1],
 [29, 31, 4],
 [30, 78, 2],
 [36, 47, 0],
 [36, 59, 4],
 [44, 67, 3],
 [52, 61, 4],
 [58, 88, 1],
 [64, 92, 1]]

In [546]: brute_force_discretize(ranges)
Out[546]:
[(0, 0, 0),
 (1, 9, 1),
 (10, 14, 0),
 (15, 17, 2),
 (18, 18, 0),
 (19, 24, 3),
 (25, 25, 0),
 (26, 28, 1),
 (29, 29, 4),
 (30, 35, 2),
 (36, 43, 0),
 (44, 51, 3),
 (52, 57, 4),
 (58, 92, 1)]

Function that makes generic cases:

def make_generic_case(num, lim, dat):
    ranges = []

    for _ in range(num):
        start = random.randrange(lim)
        end = random.randrange(lim)
        if start > end:
            start, end = end, start
        ranges.append([start, end, random.randrange(dat)])

    ranges.sort()
    return ranges

Sample inputs that code from existing answer disagrees with correct result:

Test case:

[
    [0, 31, 0],
    [1, 47, 1],
    [1, 67, 0],
    [10, 68, 0],
    [15, 17, 2],
    [18, 39, 0],
    [19, 73, 3],
    [25, 32, 0],
    [26, 33, 1],
    [26, 72, 2],
    [26, 80, 2],
    [28, 28, 1],
    [29, 31, 4],
    [30, 78, 2],
    [36, 47, 0],
    [36, 59, 4],
    [44, 67, 3],
    [52, 61, 4],
    [58, 88, 1],
    [64, 92, 1]
]

Output:

[
    (0, 0, 0),
    (1, 9, 1),
    (10, 14, 0),
    (15, 17, 2),
    (18, 18, 0),
    (19, 24, 3),
    (25, 25, 0),
    (26, 28, 1),
    (29, 29, 4),
    (30, 35, 2),
    (36, 43, 0),
    (44, 51, 3),
    (52, 57, 4),
    (58, 88, 1)
]

The output is almost the same as correct one, except for last range (should be (58, 92, 1) instead of (58, 88, 1)).

Another case:

[
    [4, 104, 4],
    [22, 463, 2],
    [24, 947, 2],
    [36, 710, 1],
    [37, 183, 1],
    [39, 698, 7],
    [51, 438, 4],
    [60, 450, 7],
    [120, 383, 2],
    [130, 193, 7],
    [160, 562, 5],
    [179, 443, 6],
    [186, 559, 6],
    [217, 765, 2],
    [221, 635, 2],
    [240, 515, 3],
    [263, 843, 3],
    [274, 759, 6],
    [288, 389, 5],
    [296, 298, 6],
    [333, 1007, 1],
    [345, 386, 5],
    [356, 885, 3],
    [377, 435, 5],
    [407, 942, 7],
    [423, 436, 1],
    [484, 926, 5],
    [496, 829, 0],
    [559, 870, 5],
    [610, 628, 1],
    [651, 787, 4],
    [735, 927, 1],
    [765, 1002, 1]
]

Output:

[(4, 21, 4),
 (22, 35, 2),
 (36, 38, 1),
 (39, 50, 7),
 (51, 59, 4),
 (60, 119, 7),
 (120, 129, 2),
 (130, 159, 7),
 (160, 178, 5),
 (179, 216, 6),
 (217, 239, 2),
 (240, 273, 3),
 (274, 287, 6),
 (288, 295, 5),
 (296, 298, 6),
 (299, 332, 5),
 (333, 344, 1),
 (345, 355, 5),
 (356, 376, 3),
 (377, 406, 5),
 (407, 422, 7),
 (423, 436, 1),
 (437, 483, 7),
 (484, 495, 5),
 (496, 558, 0),
 (559, 609, 5),
 (610, 628, 1),
 (629, 650, 5),
 (651, 734, 4),
 (735, 927, 1),
 (928, 942, 7),
 (943, 1007, 1)]

Again almost correct. Last three ranges are wrong ((735, 927, 1), (928, 942, 7), (943, 1007, 1) should be (735, 1007, 1)). I tested other inputs and proposed solution is correct, but on some edge cases it isn’t. This algorithm fails the narrow case:

def discretize_gen(ranges):
    nodes = get_nodes(ranges)
    stack = []
    for (n1, e1, d1), (n2, e2, _) in zip(nodes, nodes[1:]):
        if e1:
            stack.remove(d1)
        else:
            stack.append(d1)
        start = n1 + e1
        end = n2 - (not e2)
        if start <= end and stack:
            yield start, end, stack[-1]

Use like: list(merge(discretize_gen(ranges))). merge function can be found in answer below. It fails for many inputs:

def compare_results(ranges):
    correct = brute_force_discretize(ranges)
    correct_set = set(correct)
    output = list(merge(discretize_gen(ranges)))
    output_set = set(output)
    errors = output_set ^ correct_set
    indices = [(i, correct.index(e)) for i, e in enumerate(output) if e not in errors]
    indices.append((len(output), None))
    comparison = [(a, b) if c - a == 1 else (slice(a, c), slice(b, d)) for (a, b), (c, d) in zip(indices, indices[1:])]
    result = {}
    for a, b in comparison:
        key = output[a]
        val = correct[b]
        if isinstance(key, list):
            key = tuple(key)
        result[key] = val
    return result

Where it fails:

[(73, 104, 3), (75, 98, 0), (78, 79, 3), (83, 85, 3), (88, 90, 2)]

Comparison:

{(73, 74, 3): (73, 74, 3),
 ((75, 77, 0), (78, 87, 3)): [(75, 77, 0),
  (78, 79, 3),
  (80, 82, 0),
  (83, 85, 3),
  (86, 87, 0)],
 ((88, 90, 2), (91, 104, 3)): [(88, 90, 2), (91, 98, 0), (99, 104, 3)]}

The answer has a score of 11 and is accepted, yet it doesn’t work. How to fix it?

Answers:

Here is an approach that is somewhat like the one you have in discretize_narrow, and seems to give a slight performance improvement (10-20%) when I timed it with make_sample(2048, 65536, 16) inputs:

def solve(ranges):
    if not ranges:
        return []
        
    def disjoint_segments(ranges):
        ranges.sort(key=lambda x: (x[0], -x[1]))
        # In order to get extra loop iteration, add one dummy range
        ranges.append((float("inf"), None, None))  
        stack_end = []  # use separate stacks for ends, and data
        stack_data = []
        current = ranges[0][0]
        for start, end, data in ranges:
            # flush data from stack up to start - 1.
            while stack_end and stack_end[-1] < start:
                end2 = stack_end.pop()
                data2 = stack_data.pop()
                if current <= end2:
                    yield current, end2, data2
                    current = end2 + 1
            if stack_end and current < start:
                yield current, start - 1, stack_data[-1]
            # stack the current tuple's info
            current = start
            if not stack_end or stack_data[-1] != data or end > stack_end[-1]:
                stack_end.append(end)
                stack_data.append(data)

    def merge(segments):
        start, end, data = next(segments)  # keep one segment buffered
        for start2, end2, data2 in segments:
            if end + 1 == start2 and data == data2:  # adjacent & same data
                end = end2  # merge
            else:
                yield start, end, data
                start, end, data = start2, end2, data2
        yield start, end, data  # flush the buffer
    
    return list(merge(disjoint_segments(ranges)))

The above function should also produce correct results for the "generic" case.

Answered By: trincot