Python zip iterator and Update with Adagrad

Question:

I am attempting to understand the excellent Code given as a guide by Andrej Karpathy: https://gist.github.com/karpathy/d4dee566867f8291f086

I am new to python, still learning!

I am doing the best I can to understand the following code from the link:

# perform parameter update with Adagrad
for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], 
                              [dWxh, dWhh, dWhy, dbh, dby], 
                              [mWxh, mWhh, mWhy, mbh, mby]):
    mem += dparam * dparam
    param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update

I have read up on the zip function and done some short tests to try to understand how this works.

What I know so far, 5 Iterations, param == Wxh on the first iteration but not there on…

Ideally I am trying to convert this code to C#, and to do that I need to understand it.

In referring to Python iterator and zip it appears as we are multiplying each item of each array:

 param = Wxh * dWxh * mWxh

But then the variables param dparam and mem are being modified outside the zip function.

How do these variables function in this for loop scenario?

Asked By: Rusty Nail

||

Answers:

Python treats the variables merely as labels or name tags. Since you have zipped those inside a list of lists, it doesn’t matter where they are, as long as you address them by their name / label correctly. Kindly note, this may not work for immutable types like int or str, etc. Refer to this answer for more explanation – Immutable vs Mutable types.

Answered By: shad0w_wa1k3r

Write a simple for loop with zip will help you learn a lot.

for example:

for a, b, c in zip([1,2,3],
                    [4,5,6],
                    [7,8,9]):
    print a
    print b
    print c
    print "/"

This function will print: 1 4 7 / 2 5 8 / 3 6 7

So that the zip function just put those three lists together, and then using three variables param, dparam, mem to refer to different list.

In each iteration, those three variables refer to certain item in their corresponding lists, just like for i in [1, 2, 3]:.

In this way, you only need to write one for loop instead of three, to update grads for each parameters: Wxh, Whh, Why, bh, by.

In the first iteration, only Wxh is updated using dWxh and mWxh following the adagrad rule. And secondly, update Whh using dWhh and mWhh, and so on.

Answered By: Marshall7

What does zip do?

Quoting from the official documentation:

Zip returns a list of tuples, where the i-th tuple contains the i-th
element from each of the argument sequences or iterables. The returned
list is truncated in length to the length of the shortest argument
sequence.

It means,

 >>> zip(["A", "B"], ["C", "D"], ["E", "F"])
 [('A', 'C', 'E'), ('B', 'D', 'F')]

So now, when you are looping through, you actually have a list of tuples. With Content like.

 # These are strings here but in your case these are objects
 [('Wxh', 'dWxh', 'mWxh'), ('Whh', 'dWhh', 'mWhh'), ('Why', 'dWhy', 'mWhy'),
  ('bh', 'dbh', 'mbh'),('by', 'dby', 'mby')]

What I know so far, 5 Iterations, param == Wxh on the first iteration
but not there on…

You are right, Now lets analyze your loop.

  for param, dparam, mem in m:
      print(param, dparam, mem)

  # Which prints
('Wxh', 'dWxh', 'mWxh')
('Whh', 'dWhh', 'mWhh')
('Why', 'dWhy', 'mWhy')
('bh', 'dbh', 'mbh')
('by', 'dby', 'mby')

Which means, on every iteration, the params get the zeroth index tuple value, dparam get the first and mem gets the second.

Now when I type param out of the scope of for loop, I get

   >>> param
   'by'

It means params still holds the reference to by object.

From official documentation:

The for-loop makes assignments to the variables(s) in the target
list. […] Names in the target list are not deleted when the loop is
finished, but if the sequence is empty, they will not have been
assigned to at all by the loop.

Answered By: Charul

Any sequence (or iterable) can be unpacked into variables using a simple assignment operation. The only requirement is that the number of variables and structure match the sequence. For example:

t = (2, 4)
x, y = t

In this case zip() as per standard documentation is ” zip() Make an iterator that aggregates elements from each of the iterables.Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. So, for your case

for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], 
                              [dWxh, dWhh, dWhy, dbh, dby], 
                              [mWxh, mWhh, mWhy, mbh, mby]):
    mem += dparam * dparam
    param += -learning_rate * dparam / np.sqrt(mem + 1e-8)

lets say:
iterable1 = [Wxh, Whh, Why, bh, by]
iterable2 = [dWxh, dWhh, dWhy, dbh, dby]
iterable3 = [mWxh, mWhh, mWhy, mbh, mby]

here zip() returns [(Wxh, dWxh, mWxh), (Whh, dWhh, mWhh), (Why, dWhy, mWhy), (bh, dbh, mbh), (by, dby, mby)]

on 1st iteration:
param, dparam, mem = (Wxh, dWxh, mWxh)
so, 
param = Wxh
dparam = dWxh
mem = mWxh
mem = mem + (dparam * dparam) = mWxh + (dWxh * dWxh)
param = param + (-learning_rate * dparam / np.sqrt(mem + 1e-8)) = Wxh + (-learning_rate * dWxh / np.sqrt(mWxh + (dWxh * dWxh) + 1e-8)

on 2nd iteration:
param, dparam, mem = (Whh, dWhh, mWhh)
so, 
param = Whh
dparam = dWhh
mem = mWhh
an so on.
Answered By: JkShaw

Thank you all for excellent answers!

My python skill is poor, so I am sorry for that!

import numpy as np

print('----------------------------------------')
print('Before modification:')
a = np.random.randn(1, 3) * 1.0
print('a: ', a)
b = np.random.randn(1, 3) * 1.0
print('b: ', b)
c = np.random.randn(1, 3) * 1.0
print('c: ', c)

print('----------------------------------------')

for a1, b1, c1 in zip([a, b, c], [a, b, c], [a, b, c]):
    a1 += 10 * 0.01
    b1 += 10 * 0.01
    c1 += 10 * 0.01
    print('a1 is Equal to a: ', np.array_equal(a1, a))
    print('a1 is Equal to b: ', np.array_equal(a1, b))
    print('a1 is Equal to c: ', np.array_equal(a1, c))
    print('----------------------------------------')

print('After modification:')
print('a: ', a)
print('b: ', b)
print('c: ', c)
print('----------------------------------------')

Outputs:

----------------------------------------
Before modification:
a:  [[-0.79535459 -0.08678677  1.46957521]]
b:  [[-1.05908792 -0.90121069  1.07055281]]
c:  [[ 1.18976226  0.24700716 -0.08481322]]
----------------------------------------
a1 is Equal to a:  True
a1 is Equal to b:  False
a1 is Equal to c:  False
----------------------------------------
a1 is Equal to a:  False
a1 is Equal to b:  True
a1 is Equal to c:  False
----------------------------------------
a1 is Equal to a:  False
a1 is Equal to b:  False
a1 is Equal to c:  True
----------------------------------------
After modification:
a:  [[-0.69535459  0.01321323  1.56957521]]
b:  [[-0.95908792 -0.80121069  1.17055281]]
c:  [[ 1.28976226  0.34700716  0.01518678]]

jyotish is exactly right, and answered what I was missing! Thank You!

For C# I think I will look at a Parallel.For implementation here.

EDIT:

For others learning also, I also found it helpful to see this code work:

import numpy as np

print('----------------------------------------')
print('Before modification:')
a = np.random.randn(1, 3) * 1.0
print('a: ', a)
b = np.random.randn(1, 3) * 1.0
print('b: ', b)
c = np.random.randn(1, 3) * 1.0
print('c: ', c)

print('----------------------------------------')

for a1, b1, c1 in zip([a, b, c], [a, b, c], [a, b, c]):
    a1[0][0] = 10 * 0.01
    print('a1 is Equal to a: ', np.array_equal(a1, a))
    print('a1 is Equal to b: ', np.array_equal(a1, b))
    print('a1 is Equal to c: ', np.array_equal(a1, c))
    print('----------------------------------------')

print('After modification:')
print('a: ', a)
print('b: ', b)
print('c: ', c)
print('----------------------------------------')

Outputs:

----------------------------------------
Before modification:
a:  [[-0.78734047 -0.04803815  0.20810081]]
b:  [[ 1.88121331  0.91649695  0.02482977]]
c:  [[-0.24219954 -0.10183608  0.85180522]]
----------------------------------------
a1 is Equal to a:  True
a1 is Equal to b:  False
a1 is Equal to c:  False
----------------------------------------
a1 is Equal to a:  False
a1 is Equal to b:  True
a1 is Equal to c:  False
----------------------------------------
a1 is Equal to a:  False
a1 is Equal to b:  False
a1 is Equal to c:  True
----------------------------------------
After modification:
a:  [[ 0.1        -0.04803815  0.20810081]]
b:  [[ 0.1         0.91649695  0.02482977]]
c:  [[ 0.1        -0.10183608  0.85180522]]
----------------------------------------

As you can see, only modifying the first column of the <class 'numpy.ndarray'> data type that I am using. Its a reasonably deep operation.

Answered By: Rusty Nail

Here is the same code on C#:

    public void UpdateParametersWithAdagrad(WordGenerationRNNLossFunResultModel lossFunResultModel, Matrix mWxh, Matrix mWhh, Matrix mWhy, Matrix  mbh, Matrix mby, double learning_rate)
    {
        //mem += dparam * dparam;
        //param += -learning_rate * dparam / np.sqrt(mem + 1e-8); // adagrad update

        var param = new List<Matrix> { Wxh, Whh, Why, bh, by };
        var dparam = new List<Matrix> { lossFunResultModel.DWxh, lossFunResultModel.DWhh, lossFunResultModel.DWhy, lossFunResultModel.Dbh, lossFunResultModel.Dby };
        var mem = new List<Matrix> { mWxh, mWhh, mWhy, mbh, mby };

        for (int i = 0; i < dparam.Count; i++)
        {
            mem[i] += dparam[i] * dparam[i];
            param[i] += -learning_rate * dparam[i] / (mem[i] + 1e-8).Sqrt(); // adagrad update
        }
    }
Answered By: Alexander Poddubko
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.