Split sorted list into two lists

Question:

I’m trying to split a sorted integer list into two lists. The first list would have all ints under n and the second all ints over n. Note that n does not have to be in the original list.

I can easily do this with:

under = []
over  = []
for x in sorted_list:
    if x < n:
        under.append(x)
    else
        over.append(x)

But it just seems like it should be possible to do this in a more elegant way knowing that the list is sorted. takewhile and dropwhile from itertools sound like the solution but then I would be iterating over the list twice.

Functionally, the best I can do is this:

i = 0
while sorted_list[i] < n:
    i += 1

under = sorted_list[:i]
over = sorted_list[i:]

But I’m not even sure if it is actually better than just iterating over the list twice and it is definitely not more elegant.

I guess I’m looking for a way to get the list returned by takewhile and the remaining list, perhaps, in a pair.

Asked By: CarlosHSF

||

Answers:

I would use following approach, where I find the index and use slicing to create under and over:

sorted_list = [1,2,4,5,6,7,8]
n=6

idx = sorted_list.index(n)
under = sorted_list[:idx]
over = sorted_list[idx:]

print(under)
print(over)

Output (same as with your code):

[1, 2, 4, 5]
[6, 7, 8]

Edit: As I understood the question wrong here is an adapted solution to find the nearest index:

import numpy as np

sorted_list = [1,2,4,5,6,7,8]
n=3

idx = np.searchsorted(sorted_list, n)
under = sorted_list[:idx]
over = sorted_list[idx:]

print(under)
print(over)

Output:

[1, 2]
[4, 5, 6, 7, 8]
Answered By: JANO

The correct solution here is the bisect module. Use bisect.bisect to find the index to the right of n (or the index where it would be inserted if it’s missing), then slice around that point:

 import bisect # At top of file

 split_idx = bisect.bisect(sorted_list, n)
 under = sorted_list[:split_idx]
 over = sorted_list[split_idx:]

While any solution is going to be O(n) (you do have to copy the elements after all), the comparisons are typically more expensive than simple pointer copies (and associated reference count updates), and bisect reduces the comparison work on a sorted list to O(log n), so this will typically (on larger inputs) beat simply iterating and copying element by element until you find the split point.

Use bisect.bisect_left (which finds the leftmost index of n) instead of bisect.bisect (equivalent to bisect.bisect_right) if you want n to end up in over instead of under.

Answered By: ShadowRanger
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.