How can I make a generator which iterates over 2D numpy array?

Question:

I have a huge 2D numpy array which I want to retrieve in batches.
Array shape is=60000,3072 I want to make a generator that gives me chunks out of this array like : 1000,3072 , then next 1000,3072 and so on. How can I make a generator to iterate over this array and pass me a batch of given size?

Asked By: Talha Yousuf

||

Answers:

consider array a

a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

Option 1
Use a generator

def get_every_n(a, n=2):
    for i in range(a.shape[0] // n):
        yield a[n*i:n*(i+1)]

for sa in get_every_n(a):
    print sa

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]

Option 2
use reshape and //

a.reshape(a.shape[0] // 2, -1, a.shape[1])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Option 3
if you wanted groups of two rather than two groups

a.reshape(-1, 2, a.shape[1])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Since you explicitly stated that you need a generator you can use option 1 as the appropriate reference.

Answered By: ChiefAmay

Here’s the data that you have:

import numpy as np
full_len = 5    # In your case, 60_000
cols = 3        # In your case, 3072

nd1 = np.arange(full_len*cols).reshape(full_len,cols)

Here’s what you can do, to “generate” the slices:

Option 1, Using numpy.array_split():

from math import ceil

step_size = 2   # In your case, 1_000
split_list = np.array_split(nd1,ceil(full_len/step_size), axis=0)
print (split_list)

split_list is now a list of slices into nd1. By looping over this list, you can access the individual slices as split_list[0], split_list[1], etc, and each of these slices would be a view into nd1, and can be used exactly as you would use any other numpy array.

Output for Option 1:

Here’s the output, showing that the last slice was a bit shorter than the other regular ones:

[array([[0, 1, 2],
       [3, 4, 5]]), array([[ 6,  7,  8],
       [ 9, 10, 11]]), array([[12, 13, 14]])]

Option 2, by explicit slicing:

step_size = 2   # In your case, 1_000
myrange = range(0, full_len, step_size)

for r in myrange:
    my_slice_array = nd1 [r:r+step_size]
    print (my_slice_array.shape)

Output for Option 2:

(2, 3)
(2, 3)
(1, 3)

Note that unlike slicing lists, slicing a numpy array does not make a copy of the source array’s data. It only creates a view within the slice bounds, on the existing data of the source numpy array. This applies to both Option 1, and Option 2, since both involve the creation of slices.

Answered By: fountainhead

If you want something in generator way, this below solution works

import numpy 
bigArray = numpy.random.rand(60000, 3072) # have used this to generate dummy array

def selectArray(m,n):
  yield bigArray[m, n] # I am facing issue with giving proper slices. Please handle it yourselg. 

genObject = selectArray(1000, 3072)

and you can use either for loop or next() to iterate over genObject.

Note: if you are using next() make sure you are handling StopIteration exception.

Hope it helps.

Answered By: Raja G

I wanted to use a generator like suggested by ChiefAmay but his 1. solution only returns whole chunks, without returning the leftover chunk at the end. Here improved solution which returns every part of the array:

def get_every_n(a, n=2):
    full_chunks_len = a.shape[0] // n
    for i in range(full_chunks_len):
        yield a[n*i:n*(i+1)]
    yield a[full_chunks_len*n:]

a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12],
              [13, 14, 15]])

for chunk in get_every_n(a):
    print(chunk)

Output:

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]
[[13 14 15]]
Answered By: wsl
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.