Sorting points on multiple lines
Question:
Given that we have two lines on a graph (I just noticed that I inverted the numbers on the Y axis, this was a mistake, it should go from 11-1)
And we only care about whole number X axis intersections
We need to order these points from highest Y value to lowest Y value regardless of their position on the X axis (Note I did these pictures by hand so they may not line up perfectly).
I have a couple of questions:
1) I have to assume this is a known problem, but does it have a particular name?
2) Is there a known optimal solution when dealing with tens of billions (or hundreds of millions) of lines? Our current process of manually calculating each point and then comparing it to a giant list requires hours of processing. Even though we may have a hundred million lines we typically only want the top 100 or 50,000 results some of them are so far “below” other lines that calculating their points is unnecessary.
Answers:
-
It’s not a really complicated thing, just a "normal" sorting problem.
-
Usually sorting requires a large amount of computing time. But your case is one where you don’t need to use complex sorting techniques.
You on both graphs are growing or falling constantly, there are no "jumps". You can use this to your advantage. The basic algorithm:
- identify if a graph is growing or falling.
- write a generator, that generates the values; from left to right if raising, form right to left if falling.
- get the first value from both graphs
- insert the lower on into the result list
- get a new value from the graph that had the lower value
- repeat the last two steps until one generator is "empty"
- append the leftover items from the other generator.
-
Your data structure is a set
of tuples
lines = {(y0, Δy0), (y1, Δy1), ...}
-
You need only the ntop
points, hence build a set
containing only
the top ntop
yi
values, with a single pass over the data
top_points = choose(lines, ntop)
EDIT — to choose the ntop
we had to keep track of the smallest
one, and this is interesting info, so let’s return also this value
from choose
, also we need to initialize decremented
top_points, smallest = choose(lines, ntop)
decremented = top_points
and start a loop…
while True:
-
Generate a set
of decremented values
decremented = {(y-Δy, Δy) for y, Δy in top_points}
decremented = {(y-Δy, Δy) for y, Δy in decremented if y>smallest}
if decremented == {}: break
-
Generate a set of candidates
candidates = top_lines.union(decremented)
-
generate a new set of top points
new_top_points, smallest = choose(candidates, ntop)
The following is no more necessary
-
check if new_top_points == top_points
if new_top_points == top_points: break
top_points = new_top_points</strike>
of course we are in a loop…
The difficult part is the choose
function, but I think that this
answer to the question
How can I sort 1 million numbers, and only print the top 10 in Python?
could help you.
Given that we have two lines on a graph (I just noticed that I inverted the numbers on the Y axis, this was a mistake, it should go from 11-1)
And we only care about whole number X axis intersections
We need to order these points from highest Y value to lowest Y value regardless of their position on the X axis (Note I did these pictures by hand so they may not line up perfectly).
I have a couple of questions:
1) I have to assume this is a known problem, but does it have a particular name?
2) Is there a known optimal solution when dealing with tens of billions (or hundreds of millions) of lines? Our current process of manually calculating each point and then comparing it to a giant list requires hours of processing. Even though we may have a hundred million lines we typically only want the top 100 or 50,000 results some of them are so far “below” other lines that calculating their points is unnecessary.
-
It’s not a really complicated thing, just a "normal" sorting problem.
-
Usually sorting requires a large amount of computing time. But your case is one where you don’t need to use complex sorting techniques.
You on both graphs are growing or falling constantly, there are no "jumps". You can use this to your advantage. The basic algorithm:
- identify if a graph is growing or falling.
- write a generator, that generates the values; from left to right if raising, form right to left if falling.
- get the first value from both graphs
- insert the lower on into the result list
- get a new value from the graph that had the lower value
- repeat the last two steps until one generator is "empty"
- append the leftover items from the other generator.
-
Your data structure is a
set
of tupleslines = {(y0, Δy0), (y1, Δy1), ...}
-
You need only the
ntop
points, hence build aset
containing only
the topntop
yi
values, with a single pass over the datatop_points = choose(lines, ntop)
EDIT — to choose the
ntop
we had to keep track of the smallest
one, and this is interesting info, so let’s return also this value
fromchoose
, also we need to initializedecremented
top_points, smallest = choose(lines, ntop) decremented = top_points
and start a loop…
while True:
-
Generate a
set
of decremented values
decremented = {(y-Δy, Δy) for y, Δy in top_points}
decremented = {(y-Δy, Δy) for y, Δy in decremented if y>smallest} if decremented == {}: break
-
Generate a set of candidates
candidates = top_lines.union(decremented)
-
generate a new set of top points
new_top_points, smallest = choose(candidates, ntop)
The following is no more necessary
-
check if
new_top_points == top_points
if new_top_points == top_points: break top_points = new_top_points</strike>
of course we are in a loop…
The difficult part is the choose
function, but I think that this
answer to the question
How can I sort 1 million numbers, and only print the top 10 in Python?
could help you.