Sorting points on multiple lines

Question:

Given that we have two lines on a graph (I just noticed that I inverted the numbers on the Y axis, this was a mistake, it should go from 11-1)

Two lines on a graph

And we only care about whole number X axis intersections

Two lines on a graph with intersections

We need to order these points from highest Y value to lowest Y value regardless of their position on the X axis (Note I did these pictures by hand so they may not line up perfectly).

Two lines on a graph with intersections and ordering

I have a couple of questions:

1) I have to assume this is a known problem, but does it have a particular name?

2) Is there a known optimal solution when dealing with tens of billions (or hundreds of millions) of lines? Our current process of manually calculating each point and then comparing it to a giant list requires hours of processing. Even though we may have a hundred million lines we typically only want the top 100 or 50,000 results some of them are so far “below” other lines that calculating their points is unnecessary.

Asked By: samwise

||

Answers:

  1. It’s not a really complicated thing, just a "normal" sorting problem.

  2. Usually sorting requires a large amount of computing time. But your case is one where you don’t need to use complex sorting techniques.

You on both graphs are growing or falling constantly, there are no "jumps". You can use this to your advantage. The basic algorithm:

  • identify if a graph is growing or falling.
  • write a generator, that generates the values; from left to right if raising, form right to left if falling.
  • get the first value from both graphs
  • insert the lower on into the result list
  • get a new value from the graph that had the lower value
  • repeat the last two steps until one generator is "empty"
  • append the leftover items from the other generator.
Answered By: Klaus D.
  1. Your data structure is a set of tuples

    lines = {(y0, Δy0), (y1, Δy1), ...}
    
  2. You need only the ntop points, hence build a set containing only
    the top ntop yi values, with a single pass over the data

    top_points = choose(lines, ntop)
    

    EDIT — to choose the ntop we had to keep track of the smallest
    one, and this is interesting info, so let’s return also this value
    from choose, also we need to initialize decremented

    top_points, smallest = choose(lines, ntop)
    decremented = top_points
    

    and start a loop…

    while True:
    
  3. Generate a set of decremented values


    decremented = {(y-Δy, Δy) for y, Δy in top_points}

        decremented = {(y-Δy, Δy) for y, Δy in decremented if y>smallest}
        if decremented == {}: break
    
  4. Generate a set of candidates

        candidates = top_lines.union(decremented)
    
  5. generate a new set of top points

        new_top_points, smallest = choose(candidates, ntop)
    

    The following is no more necessary

  6. check if new_top_points == top_points

        if new_top_points == top_points: break
        top_points = new_top_points</strike>
    

    of course we are in a loop…

The difficult part is the choose function, but I think that this
answer
to the question
How can I sort 1 million numbers, and only print the top 10 in Python?
could help you.

Answered By: gboffi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.