How to read all numpy files (.npy) from directory at once without For loop?

Question:

I have 1970 npy files in (vid_frames) directory each npy file contains 20 frame of MSVD dataset. I need to load all these npy at once to be as tensor dataset.
When I use np_read = np.load(all_npy_path) , I get this error

TypeError: expected str, bytes or os.PathLike object, not Tensor

where all_npy_path contains all npy path as tensor:

all_npy_path =
['vid_frames/m1NR0uNNs5Y_104_110.avi.npy',
 'vid_frames/9Q0JfdP36kI_23_28.avi.npy',
 'vid_frames/WTf5EgVY5uU_18_23.avi.npy',
 'vid_frames/WZTGqvbqOFE_28_34.avi.npy', ..... ]
Asked By: adeljalalyousif

||

Answers:

You must use a for loop for this, and the overhead of the loop is negligible compared to the time taken to read the data from disk.

You can use threading to speed up the process and acheive the max IO speed.
but for future you might want to switch to using sqlite3 for faster IO without threading.

from multiprocessing.pool import ThreadPool
import numpy as np

all_npy_path = [
 'vid_frames/m1NR0uNNs5Y_104_110.avi.npy',
 'vid_frames/9Q0JfdP36kI_23_28.avi.npy',
 'vid_frames/WTf5EgVY5uU_18_23.avi.npy',
 'vid_frames/WZTGqvbqOFE_28_34.avi.npy',]

def load_npy(path):
    with np.load(path) as data:
        return data

with ThreadPool() as pool:
    arrays_list = pool.map(load_npy,all_npy_path)

note: pool.map is a for loop, it’s just multithreaded to be faster.

Answered By: Ahmed AEK

The following code solved the problem:

def decode_and_resize(img_path):
    tensor = tf.py_function(
        func=lambda path: np.load(path.numpy().decode("utf-8")),
        inp=[img_path],
        Tout=tf.float32
    )
    tensor.set_shape(IMAGE_SIZE_np)

    return tensor
Answered By: adeljalalyousif
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.