I want to split into train/test my numpy array files

Question:

I have 12000 files each in .npy format. Im doing this because my images are grayscaled. Each file is (64,64). I want to know if there is a way to split into test and train to use for an Autoencoder.

(64,64) numpy image

My Autoencoder will be trained with (64,64) images. If someone has experience with Autoencoders:
Is it better to train with (3,64,64) or (64,64)?
Is png, jpg format better than npy?

Asked By: Savoyevatel

||

Answers:

You can use sklearn’s train_test_split.

import numpy as np
from sklearn.model_selection import train_test_split

list_of_images = # a list containing the paths of all your data files
                 # or a numpy array of shape (12000, 64, 64)

train_list, test_list = train_test_list(list_of_images, test_size=0.1, random_state=0, shuffle=True)

The above snippet should divide your data into 90% and 10% for train and test.

  • If you apply it on a list of paths, it should return two lists of paths.
  • If you load all your images in advance into a large array of size (12000, 64, 64), then it will return two smaller arrays of (10800, 64, 64) and (1200, 64, 64) respectively.

As your images are grayscale, there is no need to use (3, 64, 64), autoencoders will work fine with (64, 64)—or (1, 64, 64), to be precise.

Answered By: Mercury