Best practice of loading a huge image dataset for ML

Question:

I’m playing around with a image dataset in kanggle (https://www.kaggle.com/competitions/paddy-disease-classification/data). In this dataset, there are about 10000 images with 480*640 resolution.
When I try to load this dataset by following code,

for (label, file) in dataset_file_img(dataset_path)
    image = load_img_into_tensor(file)
    data.append(image/255)
    data_label.append(label)

it consume about 20GB of RAM.

What is the best practice of loading a dataset like this?
Any help will/would be appreciated!

Asked By: SaigyoujiYuyuko

||

Answers:

Try the following from keras:

  1. ImageDataGenerator here

  2. image_dataset_from_directory function here

Answered By: der Fotik

If you don’t have enough GPU computing power, ImageDataGenerator will probably give you a bottleneck. As suggested by Shubham, try to use tf.data, which is the best option as far as I know.

Answered By: münsteraner