Minimizing a function using PyTorch Optimizer – Return values are all the same

Question:

I’m trying to minimize a function in order to better understand the optimizer process. As an example I used the Eggholder-Function (https://www.sfu.ca/~ssurjano/egg.html) which is 2d. My goal is to get the values of my parameters (x and y) after every optimizer iteration so that i can visualize it afterwards.

Using Pytorch I wrote the following code:

def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(params)
        


    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)

The output is as follows:

minimized_params: tensor([-29.4984, -10.5021], requires_grad=True)

and

list of params:


[tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True),
 tensor([-29.4984, -10.5021], requires_grad=True)]

While I understand that the minimized_params is infact the optimized minimum, why does list_of_params show the same values for every iteration?

Thank you and have a great day!

Asked By: telias

||

Answers:

Because they all refer to the same object. You can check it by:

id(list_of_params[0]), id(list_of_params[1])

You can clone the params to avoid that:

import torch
def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(params.detach().clone()) #here
        
    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)
#list_params
[tensor([-29.9000, -10.1000]),
 tensor([-29.7999, -10.2001]),
 tensor([-29.6996, -10.3005]),
 tensor([-29.5992, -10.4011]),
 tensor([-29.4984, -10.5021])]
Answered By: joe32140

The tensor is changed while you optimize, as such of all its shallow copies in the list are changed, in order to fix your problem you should use deepcopy, as deepcopies do not change when the original is changed:

from copy import deepcopy

def eggholder_function(x):
    return -(x[1] + 47) * torch.sin(torch.sqrt(torch.abs(x[1] + x[0]/2 + 47))) - x[0]*torch.sin(torch.sqrt(torch.abs(x[0]-(x[1]+47))))

def minimize(function, initial_parameters):
    list_params = []
    params = initial_parameters
    params.requires_grad_()
    optimizer = torch.optim.Adam([params], lr=0.1)

    for i in range(5):
        optimizer.zero_grad()
        loss = function(params)
        loss.backward()
        optimizer.step()
        list_params.append(deepcopy(params))
        


    return params, list_params

starting_point = torch.tensor([-30.,-10.])
minimized_params, list_of_params = minimize(eggholder_function, starting_point)
print(minimized_params, list_of_params)

The outputs now are in fact different from one another and only the last one is equal to the final result:

(tensor([-29.9000, -10.1000], requires_grad=True),
[tensor([-29.9000, -10.1000]),
 tensor([-29.7999, -10.2001]),
 tensor([-29.6996, -10.3005]),
 tensor([-29.5992, -10.4011]),
 tensor([-29.4984, -10.5021])]

Many problems can derive from using shallow copies, when writing the first version of my code I use deepcopy as much as possible, then remove it when (if) space optimization is needed.

Answered By: Caridorc
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.