Algorithm to find least sum of squares of differences
Question:
Basically this algorithm I’m writing takes as input a List L and wants to find a number x such that all items in L, i, minus x squared and summed are minimized. Find minimum x for the sum of abs(L[i]-x)**2
. So far my algorithm is doing what it’s supposed to, just not in the cases of floating. I’m not sure how to implement floating. For example [2, 2, 3, 4]
ideally would yield the result 2.75
, but my algorithm isn’t currently capable of yielding floating integers.
def minimize_square(L):
sumsqdiff = 0
sumsqdiffs = {}
for j in range(min(L), max(L)):
for i in range(len(L)-1):
sumsqdiff += abs(L[i]-j)**2
sumsqdiffs[j]=sumsqdiff
sumsqdiff = 0
return min(sumsqdiffs, key=sumsqdiffs.get)
Answers:
It is easy to prove [*] that the number that minimizes the sum of squared differences is the arithmetic mean of L
. This gives the following simple solution:
In [26]: L = [2, 2, 3, 4]
In [27]: sum(L) / float(len(L))
Out[27]: 2.75
or, using NumPy:
In [28]: numpy.mean(L)
Out[28]: 2.75
[*] Here is an outline of the proof:
We need to find x
that minimizes f(x) = sum((x - L[i])**2)
where the sum is taken over i=0..n-1
.
Take the derivative of f(x)
and set it to zero:
2*sum(x - L[i]) = 0
Using simple algebra, the above can be transformed into
x = sum(L[i]) / n
which is none other than the arithmetic mean of L
. QED.
I am not 100% sure this is the most efficient way to do this but what you could do is mantain the same algorithm that you have and modify the return statement.
min_int = min(sumsqdiffs, key=sumsqdiffs.get)
return bisection(L,min_int-1,min_int+1)
where bisection
implement the following method: Bisection Method
This works iff there is a single minimum for the function in the analyzed interval.
Basically this algorithm I’m writing takes as input a List L and wants to find a number x such that all items in L, i, minus x squared and summed are minimized. Find minimum x for the sum of abs(L[i]-x)**2
. So far my algorithm is doing what it’s supposed to, just not in the cases of floating. I’m not sure how to implement floating. For example [2, 2, 3, 4]
ideally would yield the result 2.75
, but my algorithm isn’t currently capable of yielding floating integers.
def minimize_square(L):
sumsqdiff = 0
sumsqdiffs = {}
for j in range(min(L), max(L)):
for i in range(len(L)-1):
sumsqdiff += abs(L[i]-j)**2
sumsqdiffs[j]=sumsqdiff
sumsqdiff = 0
return min(sumsqdiffs, key=sumsqdiffs.get)
It is easy to prove [*] that the number that minimizes the sum of squared differences is the arithmetic mean of L
. This gives the following simple solution:
In [26]: L = [2, 2, 3, 4]
In [27]: sum(L) / float(len(L))
Out[27]: 2.75
or, using NumPy:
In [28]: numpy.mean(L)
Out[28]: 2.75
[*] Here is an outline of the proof:
We need to find x
that minimizes f(x) = sum((x - L[i])**2)
where the sum is taken over i=0..n-1
.
Take the derivative of f(x)
and set it to zero:
2*sum(x - L[i]) = 0
Using simple algebra, the above can be transformed into
x = sum(L[i]) / n
which is none other than the arithmetic mean of L
. QED.
I am not 100% sure this is the most efficient way to do this but what you could do is mantain the same algorithm that you have and modify the return statement.
min_int = min(sumsqdiffs, key=sumsqdiffs.get)
return bisection(L,min_int-1,min_int+1)
where bisection
implement the following method: Bisection Method
This works iff there is a single minimum for the function in the analyzed interval.