Finding k closest numbers to a given number
Question:
Say I have a list [1,2,3,4,5,6,7]
. I want to find the 3 closest numbers to, say, 6.5. Then the returned value would be [5,6,7]
.
Finding one closest number is not that tricky in python, which can be done using
min(myList, key=lambda x:abs(x-myNumber))
But I am trying not to put a loop around this to find k closest numbers. Is there a pythonic way to achieve the above task?
Answers:
You could compute distances, and sort:
[n for d, n in sorted((abs(x-myNumber), x) for x in myList)[:k]]
This does the following:
- Create a sequence of tuples
(d, x)
where d
is the distance to your target
- Select the first
k
elements of that list
- Extract just the number values from the result, discarding the distance
The short answer
The heapq.nsmallest() function will do this neatly and efficiently:
>>> from heapq import nsmallest
>>> s = [1,2,3,4,5,6,7]
>>> nsmallest(3, s, key=lambda x: abs(x - 6.5))
[6, 7, 5]
Essentially this says, "Give me the three input values that have the smallest absolute difference from the number 6.5".
Optimizing for repeated lookups
In the comments, @Phylliida, asked how to optimize for repeated lookups with differing start points. One approach would be to pre-sort the data and then use bisect to locate the center of a small search segment:
from bisect import bisect
def k_nearest(k, center, sorted_data):
'Return *k* members of *sorted_data* nearest to *center*'
i = bisect(sorted_data, center)
segment = sorted_data[max(i-k, 0) : i+k]
return nsmallest(k, segment, key=lambda x: abs(x - center))
For example:
>>> s.sort()
>>> k_nearest(3, 6.5, s)
[6, 7, 5]
>>> k_nearest(3, 0.5, s)
[1, 2, 3]
>>> k_nearest(3, 4.5, s)
[4, 5, 3]
>>> k_nearest(3, 5.0, s)
[5, 4, 6]
Both answers were good, and Greg was right, Raymond’s answer is more high level and easier to implement, but I built upon Greg’s answer because it was easier to manipulate to fit my need.
In case anyone is searching for a way to find the n closest values from a list of dicts.
My dict looks like this, where npi is just an identifier that I need along with the value:
mydict = {u'fnpi': u'1982650024',
u'snpi': {u'npi': u'1932190360', u'value': 2672},
u'snpis': [{u'npi': u'1831289255', u'value': 20},
{u'npi': u'1831139799', u'value': 20},
{u'npi': u'1386686137', u'value': 37},
{u'npi': u'1457355257', u'value': 45},
{u'npi': u'1427043645', u'value': 53},
{u'npi': u'1477548675', u'value': 53},
{u'npi': u'1851351514', u'value': 57},
{u'npi': u'1366446171', u'value': 60},
{u'npi': u'1568460640', u'value': 75},
{u'npi': u'1326046673', u'value': 109},
{u'npi': u'1548281124', u'value': 196},
{u'npi': u'1912989989', u'value': 232},
{u'npi': u'1336147685', u'value': 284},
{u'npi': u'1801894142', u'value': 497},
{u'npi': u'1538182779', u'value': 995},
{u'npi': u'1932190360', u'value': 2672},
{u'npi': u'1114020336', u'value': 3264}]}
value = mydict['snpi']['value'] #value i'm working with below
npi = mydict['snpi']['npi'] #npi (identifier) i'm working with below
snpis = mydict['snpis'] #dict i'm working with below
To get an [id, value]
list (not just a list of values) , I use this:
[[id,val] for diff, val, id in sorted((abs(x['value']-value), x['value'], x['npi']) for x in snpis)[:6]]
Which produces this:
[[u'1932190360', 2672],
[u'1114020336', 3264],
[u'1538182779', 995],
[u'1801894142', 497],
[u'1336147685', 284],
[u'1912989989', 232]]
EDIT
I actually found it pretty easy to manipulate Raymond’s answer too, if you’re dealing with a dict (or list of lists).
from heapq import nsmallest
[[i['npi'], i['value']] for i in nsmallest(6, snpis, key=lambda x: abs(x['value']-value))]
This will produce the same as the above output.
And this
nsmallest(6, snpis, key=lambda x: abs(x['value']-value))
will produce a dict instead.
For those who wants the index instead:
def find_nearest_index(array, value, k):
array = np.array(array)
return np.argsort(abs(array - value))[:k]
Example:
find_nearest_index([-3,0,1,2,4,5], 0.2, 4)
# array([1, 2, 3, 0], dtype=int64)
# distance = [3.20 0.20 0.80 1.80 3.80 4.80]
Say I have a list [1,2,3,4,5,6,7]
. I want to find the 3 closest numbers to, say, 6.5. Then the returned value would be [5,6,7]
.
Finding one closest number is not that tricky in python, which can be done using
min(myList, key=lambda x:abs(x-myNumber))
But I am trying not to put a loop around this to find k closest numbers. Is there a pythonic way to achieve the above task?
You could compute distances, and sort:
[n for d, n in sorted((abs(x-myNumber), x) for x in myList)[:k]]
This does the following:
- Create a sequence of tuples
(d, x)
whered
is the distance to your target - Select the first
k
elements of that list - Extract just the number values from the result, discarding the distance
The short answer
The heapq.nsmallest() function will do this neatly and efficiently:
>>> from heapq import nsmallest
>>> s = [1,2,3,4,5,6,7]
>>> nsmallest(3, s, key=lambda x: abs(x - 6.5))
[6, 7, 5]
Essentially this says, "Give me the three input values that have the smallest absolute difference from the number 6.5".
Optimizing for repeated lookups
In the comments, @Phylliida, asked how to optimize for repeated lookups with differing start points. One approach would be to pre-sort the data and then use bisect to locate the center of a small search segment:
from bisect import bisect
def k_nearest(k, center, sorted_data):
'Return *k* members of *sorted_data* nearest to *center*'
i = bisect(sorted_data, center)
segment = sorted_data[max(i-k, 0) : i+k]
return nsmallest(k, segment, key=lambda x: abs(x - center))
For example:
>>> s.sort()
>>> k_nearest(3, 6.5, s)
[6, 7, 5]
>>> k_nearest(3, 0.5, s)
[1, 2, 3]
>>> k_nearest(3, 4.5, s)
[4, 5, 3]
>>> k_nearest(3, 5.0, s)
[5, 4, 6]
Both answers were good, and Greg was right, Raymond’s answer is more high level and easier to implement, but I built upon Greg’s answer because it was easier to manipulate to fit my need.
In case anyone is searching for a way to find the n closest values from a list of dicts.
My dict looks like this, where npi is just an identifier that I need along with the value:
mydict = {u'fnpi': u'1982650024',
u'snpi': {u'npi': u'1932190360', u'value': 2672},
u'snpis': [{u'npi': u'1831289255', u'value': 20},
{u'npi': u'1831139799', u'value': 20},
{u'npi': u'1386686137', u'value': 37},
{u'npi': u'1457355257', u'value': 45},
{u'npi': u'1427043645', u'value': 53},
{u'npi': u'1477548675', u'value': 53},
{u'npi': u'1851351514', u'value': 57},
{u'npi': u'1366446171', u'value': 60},
{u'npi': u'1568460640', u'value': 75},
{u'npi': u'1326046673', u'value': 109},
{u'npi': u'1548281124', u'value': 196},
{u'npi': u'1912989989', u'value': 232},
{u'npi': u'1336147685', u'value': 284},
{u'npi': u'1801894142', u'value': 497},
{u'npi': u'1538182779', u'value': 995},
{u'npi': u'1932190360', u'value': 2672},
{u'npi': u'1114020336', u'value': 3264}]}
value = mydict['snpi']['value'] #value i'm working with below
npi = mydict['snpi']['npi'] #npi (identifier) i'm working with below
snpis = mydict['snpis'] #dict i'm working with below
To get an [id, value]
list (not just a list of values) , I use this:
[[id,val] for diff, val, id in sorted((abs(x['value']-value), x['value'], x['npi']) for x in snpis)[:6]]
Which produces this:
[[u'1932190360', 2672],
[u'1114020336', 3264],
[u'1538182779', 995],
[u'1801894142', 497],
[u'1336147685', 284],
[u'1912989989', 232]]
EDIT
I actually found it pretty easy to manipulate Raymond’s answer too, if you’re dealing with a dict (or list of lists).
from heapq import nsmallest
[[i['npi'], i['value']] for i in nsmallest(6, snpis, key=lambda x: abs(x['value']-value))]
This will produce the same as the above output.
And this
nsmallest(6, snpis, key=lambda x: abs(x['value']-value))
will produce a dict instead.
For those who wants the index instead:
def find_nearest_index(array, value, k):
array = np.array(array)
return np.argsort(abs(array - value))[:k]
Example:
find_nearest_index([-3,0,1,2,4,5], 0.2, 4)
# array([1, 2, 3, 0], dtype=int64)
# distance = [3.20 0.20 0.80 1.80 3.80 4.80]