Out of memory converting image files to numpy array

Question:

I’m trying to run a loop that iterates through an image folder and returns two numpy arrays: x – stores the image as a numpy array y – stores the label.

A folder can easily have over 40.000 rgb images, with dimensions (224,224).
I have around 12Gb of memory but after some iterations, the used memory just spikes up and everything stops.

What can I do to fix this issue?

def create_set(path, quality):
    x_file = glob.glob(path + '*')
    x = []

    for i, img in enumerate(x_file):
        image = cv2.imread(img, cv2.IMREAD_COLOR)
        x.append(np.asarray(image))
        if i % 50 == 0:
            print('{} - {} images processed'.format(path, i))

    x = np.asarray(x)
    x = x/255

    y = np.zeros((x.shape[0], 2))
    if quality == 0:
        y[:,0] = 1
    else:
        y[:,1] = 1 

    return x, y
Asked By: jruivo

||

Answers:

You just can’t load that many images into memory. You’re trying to load every file in a given path to memory, by appending them to x.

Try processing them in batches, or if you’re doing this for a tensorflow application try writing them to .tfrecords first.

If you want to save some memory, leave the images as np.uint8 rather than casting them to float (which happens automatically when you normalise them in this line > x = x/255)

You also don’t need np.asarray in your x.append(np.asarray(image)) line. image is already an array. np.asarray is for converting lists, tuples, etc to arrays.

edit:

a very rough batching example:

def batching function(imlist, batchsize):
    ims = []
    batch = imlist[:batchsize]

    for image in batch:
        ims.append(image)
        other_processing()

    new_imlist = imlist[batchsize:]
    return x, new_imlist

def main():
    imlist = all_the_globbing_here()
    for i in range(total_files/batch_size):
        ims, imlist = batching_function(imlist, batchsize)
        process_images(ims)
Answered By: Tim Bradley
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.