Neural network built from scratch in python to classify digits stuck at 11.35 percent accuracy. I am using the MNIST dataset

Question:

My neural network is stuck at 11.35 percent accuracy and i am unable to trace the error.
low accuracy at 11.35 percent

I am following this code https://github.com/MLForNerds/DL_Projects/blob/main/mnist_ann.ipynb which I found in a youtube video.

Here is my code for the neural network(I have defined Xavier weight initialization in a module called nn):

"""1. 784 neurons in input layer
   2. 128 neurons in hidden layer 1 
   3. 64 neurons in hidden layer 2
   4. 10 neurons in output layer"""
def softmax(input):
    y = np.exp(input - input.max())
    activated = y/ np.sum(y, axis=0)
    return activated
 
def softmax_grad(x):
    exps = np.exp(x-x.max())
    return exps / np.sum(exps,axis = 0) * (1 - exps /np.sum(exps,axis = 0))

def sigmoid(input):
    activated = 1/(1 + np.exp(-input))
    return activated

def sigmoid_grad(input):
    grad = input*(1-input)
    return grad


class DenseNN:
    def __init__(self,d0,d1,d2,d3):
        self.params = {'w1': nn.Xavier.initialize(d0, d1),
                       'w2': nn.Xavier.initialize(d1, d2),
                       'w3': nn.Xavier.initialize(d2, d3)}
        
    def forward(self,a0):
        params = self.params
        params['a0'] = a0
        params['z1'] = np.dot(params['w1'],params['a0'])
        params['a1'] = sigmoid(params['z1'])
        
        params['z2'] = np.dot(params['w2'],params['a1'])
        params['a2'] = sigmoid(params['z2'])
        
        params['z3'] = np.dot(params['w3'],params['a2'])
        params['a3'] = softmax(params['z3'])
        
        return params['a3']
        
    def backprop(self,y_true,y_pred):
        params = self.params
        w_change = {}
        error = softmax_grad(params['z3'])*((y_pred - y_true)/y_true.shape[0])
        w_change['w3'] = np.outer(error,params['a2'])
        
        error = np.dot(params['w3'].T,error)*sigmoid_grad(params['a2'])
        w_change['w2'] = np.outer(error,params['a1'])
        
        error = np.dot(params['w2'].T,error)*sigmoid_grad(params['a1'])
        w_change['w1'] = np.outer(error,params['a0'])
        return w_change
        
    def update_weights(self,learning_rate,w_change):
        self.params['w1'] -= learning_rate*w_change['w1']
        self.params['w2'] -= learning_rate*w_change['w2']
        self.params['w3'] -= learning_rate*w_change['w3']

    def train(self,epochs,lr):
        for epoch in range(epochs):
            for i in range(60000):
                a0 = np.array([x_train[i]]).T
                o = np.array([y_train[i]]).T
                y_pred = self.forward(a0)
                w_change = self.backprop(o,y_pred)
                self.update_weights(lr,w_change)
                # print(self.compute_accuracy()*100)
                # print(calc_mse(a3, o))
            print((self.compute_accuracy())*100)
    
    def compute_accuracy(self):
        '''
            This function does a forward pass of x, then checks if the indices
            of the maximum value in the output equals the indices in the label
            y. Then it sums over each prediction and calculates the accuracy.
        '''
        predictions = []

        for i in range(10000):
            idx = i
            a0 = x_test[idx]
            a0 = np.array([a0]).T
            #print("acc a1",np.shape(a1))
            o = y_test[idx]
            o = np.array([o]).T
            #print("acc o",np.shape(o))
            output = self.forward(a0)
            pred = np.argmax(output)
            predictions.append(pred == np.argmax(o))
        return np.mean(predictions)
      
      

Here is the code for loading the data:

#load dataset csv
train_data = pd.read_csv('../Datasets/MNIST/mnist_train.csv')
test_data = pd.read_csv('../Datasets/MNIST/mnist_test.csv')

#train data
x_train = train_data.drop('label',axis=1).to_numpy()
y_train = pd.get_dummies(train_data['label']).values

#test data
x_test = test_data.drop('label',axis=1).to_numpy()
y_test = pd.get_dummies(test_data['label']).values

fac = 0.99 / 255
x_train = np.asfarray(x_train) * fac + 0.01
x_test = np.asfarray(x_test) * fac + 0.01

# train_labels = np.asfarray(train_data[:, :1])
# test_labels = np.asfarray(test_data[:, :1])

#printing dimensions
print(np.shape(x_train)) #(60000,784)
print(np.shape(y_train)) #(60000,10)


print(np.shape(x_test)) #(10000,784)
print(np.shape(y_test)) #(10000,10)
print((x_train))

Kindly help

I am a newbie in machine learning so any help would be appreciated.I am unable to figure out where i am going wrong.Most of the code is almost similar to https://github.com/MLForNerds/DL_Projects/blob/main/mnist_ann.ipynb but it manages to get 60 percent accuracy.

EDIT

I found the mistake :
Thanks to Bartosz Mikulski.

The problem was with how the weights were initialized in my Xavier weights initialization algorithm.
I changed the code for weights initialization to this:

    self.params = {
    'w1':np.random.randn(d1, d0) * np.sqrt(1. / d1),
    'w2':np.random.randn(d2, d1) * np.sqrt(1. / d2),
    'w3':np.random.randn(d3, d2) * np.sqrt(1. / d3),
    'b1':np.random.randn(d1, 1)  * np.sqrt(1. / d1),
    'b2':np.random.randn(d2, 1)  * np.sqrt(1. / d2),
    'b3':np.random.randn(d3, 1)  * np.sqrt(1. / d3),
}

then i got the output:
After changing weights initialization

after adding the bias parameters i got the output:
After changing weights initialization and adding bias

3: After changing weights initialization and adding bias

Asked By: anonymouse

||

Answers:

The one problem that I can see is that you are using only weights but no biases. They are very important because they allow your model to change the position of the decision plane (boundary) in the solution space. If you only have weights you can only angle the solution.

I guess that basically, this is the best fit you can get without biases. The dense layer is basically a linear function: w*x + b and you are missing the b. See the PyTorch documentation for the example: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#linear.

Also, can you show your Xavier initialization? In your case, even the simple normal distributed values would be enough as initialization, no need to rush into more advanced topics.

I would also suggest you start from the smaller problem (for example Iris dataset) and no hidden layers (just a simple linear regression that learns by using gradient descent). Then you can expand it by adding hidden layers, and then by trying harder problems with the code you already have.

Answered By: Bartosz Mikulski