Python PIL: open many files and load them into memory

Question:

I have a dataset containing 3000 images in train and 6000 images in test. It’s 320×320 rgb png files. I thought that I can load this entire dataset into memory (since it’s just 100mb), but then I try to do that I’m getting "[Errno 24] Too many open files: …" error. Code of loading looks like that:

train_images = []
for index, row in dataset_p_train.iterrows():
    path = data_path / row.img_path
    train_images.append(Image.open(path))

I know that I’m opening 9000 files and not closing them which isn’t a good practice, but unfortunately for my classificator I heavily rely on PIL img.getcolors() method, so I really want to store that dataset in memory as list of PIL images and not as a numpy array of 3000x320x320x3 uint8 to avoid casting them into PIL image each time I need colors of image.

So, what should I do? Somehow increase limit of opened files? Or there is a way to make PIL images reside entirely in memory without being "opened" from disk?

Answers:

Image.open is lazy. It will not load the data until you try to do something with it.

You can call the image’s load method to explicitly load the file contents. This will also close the file, unless the image has multiple frames (for example, an animated GIF).

See File Handling in Pillow for more details.

Answered By: user2357112