fminunc alternate in numpy

Question:

Is there an alternative to the fminunc function (from octave/matlab) in python? I have a cost function for a binary classifier. Now I want to run gradient descent to get minimum value of theta. The octave/matlab implementation will look like this.

%  Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost 
[theta, cost] = ...
    fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);

I have converted my costFunction in python using numpy library, and looking for the fminunc or any other gradient descent algorithm implementation in numpy.

Asked By: Anurag Sharma

||

Answers:

Looks like you have to change to scipy.

There you find all basic optimization algorithms readily implemented.

http://docs.scipy.org/doc/scipy/reference/optimize.html

Answered By: Jan

There is more information about the functions of interest here: http://docs.scipy.org/doc/scipy-0.10.0/reference/tutorial/optimize.html

Also, it looks like you are doing the Coursera Machine Learning course, but in Python. You might check out http://aimotion.blogspot.com/2011/11/machine-learning-with-python-logistic.html; this guy’s doing the same thing.

Answered By: Cartesian Theater

I was also trying to implement logistic regression as discussed in Coursera ML course, but in python. I found scipy helpful. After trying different algorithm implementations in minimize function, I found Newton Conjugate Gradient as most helpful. Also After examining its returned value, it seems that it is equivalent to that of fminunc in Octave. I have included my implementation in python below find to optimal theta.

import numpy as np
import scipy.optimize as op

def Sigmoid(z):
    return 1/(1 + np.exp(-z));

def Gradient(theta,x,y):
    m , n = x.shape
    theta = theta.reshape((n,1));
    y = y.reshape((m,1))
    sigmoid_x_theta = Sigmoid(x.dot(theta));
    grad = ((x.T).dot(sigmoid_x_theta-y))/m;
    return grad.flatten();

def CostFunc(theta,x,y):
    m,n = x.shape; 
    theta = theta.reshape((n,1));
    y = y.reshape((m,1));
    term1 = np.log(Sigmoid(x.dot(theta)));
    term2 = np.log(1-Sigmoid(x.dot(theta)));
    term1 = term1.reshape((m,1))
    term2 = term2.reshape((m,1))
    term = y * term1 + (1 - y) * term2;
    J = -((np.sum(term))/m);
    return J;

# intialize X and y
X = np.array([[1,2,3],[1,3,4]]);
y = np.array([[1],[0]]);

m , n = X.shape;
initial_theta = np.zeros(n);
Result = op.minimize(fun = CostFunc, 
                                 x0 = initial_theta, 
                                 args = (X, y),
                                 method = 'TNC',
                                 jac = Gradient);
optimal_theta = Result.x;
Answered By: chammu

Implemented as below and getting similar result of octiva:

                        import pandas as pd
                        import numpy as np
                        import matplotlib.pyplot as plt
                        import seaborn as sns
                        %matplotlib inline
                        filepath =('C:/Pythontry/MachineLearning/dataset/couresra/ex2data1.txt')
                        data =pd.read_csv(filepath,sep=',',header=None)
                        #print(data)
                        X = data.values[:,:2]  #(100,2)
                        y = data.values[:,2:3] #(100,1)
                        #print(np.shape(y))
                        #In 2
                        #%% ==================== Part 1: Plotting ====================
                        postive_value = data.loc[data[2] == 1]
                        #print(postive_value.values[:,2:3])
                        negative_value = data.loc[data[2] == 0]
                        #print(len(postive_value))
                        #print(len(negative_value))
                        ax1 = postive_value.plot(kind='scatter',x=0,y=1,s=50,color='b',marker="+",label="Admitted") # S is line width #https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter 
                        ax2 = negative_value.plot(kind='scatter',x=0,y=1,s=50,color='y',ax=ax1,label="Not Admitted")
                        ax1.set_xlabel("Exam 1 score")
                        ax2.set_ylabel("Exam 2 score")
                        plt.show()
                        #print(ax1 == ax2)
                        #print(np.shape(X))

                # In 3
                        #============ Part 2: Compute Cost and Gradient ===========
                        [m,n] = np.shape(X) #(100,2)
                        print(m,n)
                        additional_coulmn = np.ones((m,1))
                        X = np.append(additional_coulmn,X,axis=1)
                        initial_theta = np.zeros((n+1), dtype=int)
                        print(initial_theta)

                        # In4
                        #Sigmoid and cost function
                        def sigmoid(z):
                            g = np.zeros(np.shape(z));
                            g = 1/(1+np.exp(-z));
                            return g
                        def costFunction(theta, X, y):
                               J = 0;
                               #print(theta)
                               receive_theta = np.array(theta)[np.newaxis] ##This command is used to create the 1D array 
                               #print(receive_theta)
                               theta = np.transpose(receive_theta)
                               #print(np.shape(theta))       
                               #grad = np.zeros(np.shape(theta))
                               z = np.dot(X,theta) # where z = theta*X
                               #print(z)
                               h = sigmoid(z) #formula h(x) = g(z) whether g = 1/1+e(-z) #(100,1)
                               #print(np.shape(h))
                               #J = np.sum(((-y)*np.log(h)-(1-y)*np.log(1-h))/m); 
                               J = np.sum(np.dot((-y.T),np.log(h))-np.dot((1-y).T,np.log(1-h)))/m
                               #J = (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
                               #error = h-y
                               #print(np.shape(error))
                               #print(np.shape(X))
                               grad =np.dot(X.T,(h-y))/m
                               #print(grad)
                               return J,grad
            #In5
                        [cost, grad] = costFunction(initial_theta, X, y)
                        print('Cost at initial theta (zeros):', cost)
                        print('Expected cost (approx): 0.693n')
                        print('Gradient at initial theta (zeros): n',grad)
                        print('Expected gradients (approx):n -0.1000n -12.0092n -11.2628n')

            In6 # Compute and display cost and gradient with non-zero theta
            test_theta = [-24, 0.2, 0.2]
            #test_theta_value = np.array([-24, 0.2, 0.2])[np.newaxis]  #This command is used to create the 1D row array 

            #test_theta = np.transpose(test_theta_value) # Transpose 
            #test_theta = test_theta_value.transpose()
            [cost, grad] = costFunction(test_theta, X, y)

            print('nCost at test theta: n', cost)
            print('Expected cost (approx): 0.218n')
            print('Gradient at test theta: n',grad);
            print('Expected gradients (approx):n 0.043n 2.566n 2.647n')

#IN6
    # ============= Part 3: Optimizing using range  =============
    import scipy.optimize as opt
    #initial_theta_initialize = np.array([0, 0, 0])[np.newaxis]
    #initial_theta = np.transpose(initial_theta_initialize)
    print ('Executing minimize function...n')
    # Working models
    #result = opt.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
    result = opt.fmin_tnc(func=costFunction, x0=initial_theta, args=(X, y))
    # Not working model
    #costFunction(initial_theta,X,y)
    #model = opt.minimize(fun = costFunction, x0 = initial_theta, args = (X, y), method = 'TNC',jac = costFunction)
    print('Thetas found by fmin_tnc function: ', result);
    print('Cost at theta found : n', cost);
    print('Expected cost (approx): 0.203n');
    print('theta: n',result[0]);
    print('Expected theta (approx):n');
    print(' -25.161n 0.206n 0.201n');

output:
Executing minimize function…

Thetas found by fmin_tnc function: (array([-25.16131854, 0.20623159, 0.20147149]), 36, 0)
Cost at theta found :
0.218330193827
Expected cost (approx): 0.203

theta:
[-25.16131854 0.20623159 0.20147149]
Expected theta (approx):

-25.161
0.206
0.201

Answered By: thangaraj1980

Thanks! This code helped me to understand how scipy optimize works. I believe that in the "Not Working Model" you should separate Cost and Gradiate Functions as in
the example SciPy minimize with gradient
and in accordance with the jac field description in the documentation https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize