Finding index of an item closest to the value in a list that's not entirely sorted
Question:
As an example my list is:
[25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866,
19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154,
13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152,
6.983734, 6.048035, 5.505096, 4.65799]
and I’m looking for the index of the value closest to 11.5
. I’ve tried other methods such as binary search and bisect_left
but they don’t work.
I cannot sort this array, because the index of the value will be used on a similar array to fetch the value at that index.
Answers:
How about: you zip the two lists, then sort the result?
Try the following:
min(range(len(a)), key=lambda i: abs(a[i]-11.5))
For example:
>>> a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]
>>> min(range(len(a)), key=lambda i: abs(a[i]-11.5))
16
Or to get the index and the value:
>>> min(enumerate(a), key=lambda x: abs(x[1]-11.5))
(16, 11.33447)
If you can’t sort the array, then there is no quick way to find the closest item – you have to iterate over all entries.
There is a workaround but it’s quite a bit of work: Write a sort algorithm which sorts the array and (at the same time) updates a second array which tells you where this entry was before the array was sorted.
That way, you can use binary search to look up index of the closest entry and then use this index to look up the original index using the “index array”.
[EDIT] Using zip()
, this is pretty simple to achieve:
array_to_sort = zip( original_array, range(len(original_array)) )
array_to_sort.sort( key=i:i[0] )
Now you can binary search for the value (using item[0]
). item[1]
will give you the original index.
Going through all the items is only linear. If you would sort the array that would be worse.
I don’t see a problem on keeping an additional deltax
(the min difference so far) and idx
(the index of that element) and just loop once trough the list.
Keep in mind that if space isn’t important you can sort any list without moving the contents by creating a secondary list of the sorted indices.
Also bear in mind that if you are doing this look up just once, then you will just have to traverse every element in the list O(n). (If multiple times then you probably would want to sort for increase efficiency later)
import numpy as np
a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]
index = np.argmin(np.abs(np.array(a)-11.5))
a[index] # here is your result
In case a is already an array, the corresponding transformation can be ommitted.
If you are searching a long list a lot of times, then min
scales very bad (O(n^2), if you append some of your searches to the search list, I think).
Bisect is your friend. Here’s my solution. It scales O(n*log(n)):
class Closest:
"""Assumes *no* redundant entries - all inputs must be unique"""
def __init__(self, numlist=None, firstdistance=0):
if numlist == None:
numlist=[]
self.numindexes = dict((val, n) for n, val in enumerate(numlist))
self.nums = sorted(self.numindexes)
self.firstdistance = firstdistance
def append(self, num):
if num in self.numindexes:
raise ValueError("Cannot append '%s' it is already used" % str(num))
self.numindexes[num] = len(self.nums)
bisect.insort(self.nums, num)
def rank(self, target):
rank = bisect.bisect(self.nums, target)
if rank == 0:
pass
elif len(self.nums) == rank:
rank -= 1
else:
dist1 = target - self.nums[rank - 1]
dist2 = self.nums[rank] - target
if dist1 < dist2:
rank -= 1
return rank
def closest(self, target):
try:
return self.numindexes[self.nums[self.rank(target)]]
except IndexError:
return 0
def distance(self, target):
rank = self.rank(target)
try:
dist = abs(self.nums[rank] - target)
except IndexError:
dist = self.firstdistance
return dist
Use it like this:
a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866,
19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154,
13.09409, 12.18347, 1.33447, 10.32184, 9.544922, 8.813385, 8.181152,
6.983734, 6.048035, 5.505096, 4.65799]
targets = [1.0, 100.0, 15.0, 15.6, 8.0]
cl = Closest(a)
for x in targets:
rank = cl.rank(x)
print("Closest to %5.1f : rank=%2i num=%8.5f index=%2i " % (x, rank,
cl.nums[rank], cl.closest(x)))
Will output:
Closest to 1.0 : rank= 0 num= 1.33447 index=16
Closest to 100.0 : rank=25 num=26.78030 index= 1
Closest to 15.0 : rank=12 num=14.79059 index=12
Closest to 15.6 : rank=13 num=15.71255 index=11
Closest to 8.0 : rank= 5 num= 8.18115 index=20
And:
cl.append(99.9)
x = 100.0
rank = cl.rank(x)
print("Closest to %5.1f : rank=%2i num=%8.5f index=%2i " % (x, rank,
cl.nums[rank], cl.closest(x)))
Output:
Closest to 100.0 : rank=25 num=99.90000 index=25
As an example my list is:
[25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866,
19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154,
13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152,
6.983734, 6.048035, 5.505096, 4.65799]
and I’m looking for the index of the value closest to 11.5
. I’ve tried other methods such as binary search and bisect_left
but they don’t work.
I cannot sort this array, because the index of the value will be used on a similar array to fetch the value at that index.
How about: you zip the two lists, then sort the result?
Try the following:
min(range(len(a)), key=lambda i: abs(a[i]-11.5))
For example:
>>> a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]
>>> min(range(len(a)), key=lambda i: abs(a[i]-11.5))
16
Or to get the index and the value:
>>> min(enumerate(a), key=lambda x: abs(x[1]-11.5))
(16, 11.33447)
If you can’t sort the array, then there is no quick way to find the closest item – you have to iterate over all entries.
There is a workaround but it’s quite a bit of work: Write a sort algorithm which sorts the array and (at the same time) updates a second array which tells you where this entry was before the array was sorted.
That way, you can use binary search to look up index of the closest entry and then use this index to look up the original index using the “index array”.
[EDIT] Using zip()
, this is pretty simple to achieve:
array_to_sort = zip( original_array, range(len(original_array)) )
array_to_sort.sort( key=i:i[0] )
Now you can binary search for the value (using item[0]
). item[1]
will give you the original index.
Going through all the items is only linear. If you would sort the array that would be worse.
I don’t see a problem on keeping an additional deltax
(the min difference so far) and idx
(the index of that element) and just loop once trough the list.
Keep in mind that if space isn’t important you can sort any list without moving the contents by creating a secondary list of the sorted indices.
Also bear in mind that if you are doing this look up just once, then you will just have to traverse every element in the list O(n). (If multiple times then you probably would want to sort for increase efficiency later)
import numpy as np
a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]
index = np.argmin(np.abs(np.array(a)-11.5))
a[index] # here is your result
In case a is already an array, the corresponding transformation can be ommitted.
If you are searching a long list a lot of times, then min
scales very bad (O(n^2), if you append some of your searches to the search list, I think).
Bisect is your friend. Here’s my solution. It scales O(n*log(n)):
class Closest:
"""Assumes *no* redundant entries - all inputs must be unique"""
def __init__(self, numlist=None, firstdistance=0):
if numlist == None:
numlist=[]
self.numindexes = dict((val, n) for n, val in enumerate(numlist))
self.nums = sorted(self.numindexes)
self.firstdistance = firstdistance
def append(self, num):
if num in self.numindexes:
raise ValueError("Cannot append '%s' it is already used" % str(num))
self.numindexes[num] = len(self.nums)
bisect.insort(self.nums, num)
def rank(self, target):
rank = bisect.bisect(self.nums, target)
if rank == 0:
pass
elif len(self.nums) == rank:
rank -= 1
else:
dist1 = target - self.nums[rank - 1]
dist2 = self.nums[rank] - target
if dist1 < dist2:
rank -= 1
return rank
def closest(self, target):
try:
return self.numindexes[self.nums[self.rank(target)]]
except IndexError:
return 0
def distance(self, target):
rank = self.rank(target)
try:
dist = abs(self.nums[rank] - target)
except IndexError:
dist = self.firstdistance
return dist
Use it like this:
a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866,
19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154,
13.09409, 12.18347, 1.33447, 10.32184, 9.544922, 8.813385, 8.181152,
6.983734, 6.048035, 5.505096, 4.65799]
targets = [1.0, 100.0, 15.0, 15.6, 8.0]
cl = Closest(a)
for x in targets:
rank = cl.rank(x)
print("Closest to %5.1f : rank=%2i num=%8.5f index=%2i " % (x, rank,
cl.nums[rank], cl.closest(x)))
Will output:
Closest to 1.0 : rank= 0 num= 1.33447 index=16
Closest to 100.0 : rank=25 num=26.78030 index= 1
Closest to 15.0 : rank=12 num=14.79059 index=12
Closest to 15.6 : rank=13 num=15.71255 index=11
Closest to 8.0 : rank= 5 num= 8.18115 index=20
And:
cl.append(99.9)
x = 100.0
rank = cl.rank(x)
print("Closest to %5.1f : rank=%2i num=%8.5f index=%2i " % (x, rank,
cl.nums[rank], cl.closest(x)))
Output:
Closest to 100.0 : rank=25 num=99.90000 index=25