Perceptron algorithm does not converge correctly

Question:

Given the following labelled data:

feature_matrix = np.array 
    (
        [
            [   1,    1],
            [   2,    3],
            [   3,    4],
            [-0.5, -0.5],
            [  -1,   -2],
            [  -2,   -3],
            [  -3,   -4],
            [  -4,   -3]
        ]
    )

labels = np.array 
    (
        [
             1,
             1,
             1,
            -1,
            -1,
            -1,
            -1,
            -1
        ]
    )

The idea is to run the linear perceptron algorithm through the labelled data until convergence in order to find the resulting parameter θ and offset parameter θ0. For reference, here is the perceptron function I wrote in python:

def perceptron(feature_matrix, labels, convergence=True):

    # initialize theta and theta_0 to zero
    theta = np.zeros(len(feature_matrix[0])); theta_0 = 0

    while convergence:
        for i in range(len(feature_matrix)):

            exit = True

            if (labels[i] * np.dot(feature_matrix[i], theta) <= 0):

                theta   = theta   + labels[i] * feature_matrix[i]
                theta_0 = theta_0 + labels[i]

                exit = False

        if exit: break

    return (theta, theta_0)

I have used this code before, and can confirm that it works. The exit flag makes sure that the loop breaks if there are no mistakes, however, the resulting parameters apparently are not correct. So, the perceptron algorithm is initialised with the parameters to zero, i.e θ=[0,0] and θ0=0. The first datapoint to be explored is (1,1). Below are the values the function returns after it converges:

parameters = perceptron(feature_matrix, labels)

print('Theta_0:', parameters[-1])
print('Theta  :', parameters[0].tolist())

# Theta_0: 1
# Theta  : [1.0, 1.0]

These values seem off, since the decision boundary sits right on top of the datapoint (-0.5,-0.5) as can be seen in a desmos graph here. I also figured that if I change the order, say the first datapoint to be explored be (2,3), then:

parameters = perceptron(feature_matrix, labels)

print('Theta_0:', parameters[-1])
print('Theta  :', parameters[0].tolist())

# Theta_0: 1
# Theta  : [2.0, 3.0]

Regardless of the order of iteration, it seems like the perceptron algorithm takes the first values as resulting parameters, and does not make any mistakes afterwards as per this conditional clause (labels[i] * np.dot(feature_matrix[i], theta) <= 0).

Why is this is happening and what should be the correct approach?

Asked By: govog67090

||

Answers:

"I have used this code before, and can confirm that it works".

Your code is never supposed to work. Here’s why – the entire theory of perceptron algorithm is based on four things:

  1. forward pass of the data through the perceptron
  2. activation function
  3. error computation
  4. backpropagation of error

Your code only has the first one, but the last three are missing. Those three components are arguably the most important parts of a perceptron network. You also need to have a learning rate, since just backpropagating the entire error will never lead to convergence of the network.

The following code is based on this Medium.com article. It explains the theory of the perceptron network in detail and I suggest that you go through it.

The activation function had to be a tanh function) since your labels are 1 and -1.

import numpy as np
import matplotlib.pyplot as plt


def activation_function(x):
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))


def perceptron(feature_matrix, labels, learning_rate, tolerance=1e-5):
    _, num_features = feature_matrix.shape
    theta = np.zeros(num_features + 1)

    previous_error = 0
    errors = []

    while True:
        current_error = 0

        for index, sample in enumerate(feature_matrix):
            sample = np.insert(sample, 0, 1)
            y_hat = activation_function(np.dot(sample, theta))

            if y_hat - labels[index] != 0:
                theta += learning_rate * ((labels[index] - y_hat) * sample)
                current_error += y_hat - labels[index]

        errors.append(current_error)

        if abs(previous_error - current_error) <= tolerance:
            break

        previous_error = current_error

    return theta, errors


feature_matrix = np.array(
    [[1, 1], [2, 3], [3, 4], [-0.5, -0.5], [-1, -2], [-2, -3], [-3, -4], [-4, -3]]
)

labels = np.array([1, 1, 1, -1, -1, -1, -1, -1])

parameters, errors = perceptron(feature_matrix, labels, 1e-2)

print("Theta_0:", parameters[0])
print("Theta  :", parameters[1:].tolist())

plt.plot(errors)
plt.xlabel("Epochs")
plt.ylabel("Error")
plt.title("Error vs Epoch")
plt.show()

This gave me the following output:

Theta_0: -0.6057138254036737
Theta  : [1.4919048872642444, 1.6402462207966715]

You can also see that the error is decreasing over time.

Error vs Epoch

Answered By: mahieyin-rahmun

Lets start from the begining. Your exit=True should be outside your loop over examples as you want to exit iff all samples are correctly labeled. Currently you will exit as soon as the last one is correctly labeled.

Second issue is that while you are updating theta_0 you are not actually using it in your criterion. Luckily for this dataset you do not need a bias as an optimal classifier crosses the origin.

Finally, (1, 1) is the correct solution because of forgeting about theta_0! (so if theta_0=0, then indeed theta=(1,1) works fine), since it means "classify as positive class if and only if sum of features is positive", and for your points

feature_matrix = np.array 
    (
        [
            [   1,    1],  # positive sample ; positive sum ; CORRECT
            [   2,    3],  # positive sample ; positive sum ; CORRECT
            [   3,    4],  # positive sample ; positive sum ; CORRECT
            [-0.5, -0.5],  # negative sample ; negative sum ; CORRECT
            [  -1,   -2],  # negative sample ; negative sum ; CORRECT
            [  -2,   -3],  # negative sample ; negative sum ; CORRECT
            [  -3,   -4],  # negative sample ; negative sum ; CORRECT
            [  -4,   -3]   # negative sample ; negative sum ; CORRECT
        ]
    )

After fixing theta_0 you will also end up with a different solution, theta=(1.5, 1.5), theta_0=0 which is also correct
enter image description here
In general there are infinitely many solutions that perceptron might end up in.

Note, that the classical perceptron algorithm converges (not just numerically, but actually stops and succeeds with 100% accuracy) with no learning rates, etc. as long as your data is linearly separable, and this dataset clearly is.

So the final code will look something like this:

def perceptron(feature_matrix, labels, convergence=True):

    # initialize theta and theta_0 to zero
    theta = np.zeros(len(feature_matrix[0]));
    theta_0 = 0

    while convergence:
        # FIRST FIX, initialisation outside the loop
        exit = True
        for i in range(len(feature_matrix)):
            # SECOND FIX, theta_0 used in error criterion
            if (labels[i] * (np.dot(feature_matrix[i], theta) + theta_0) <= 0):
                theta   = theta   + labels[i] * feature_matrix[i]
                theta_0 = theta_0 + labels[i]

                exit = False

        if exit: break

    return (theta, theta_0)
Answered By: lejlot