Is it possible to map a NumPy function to tf.data.dataset?

Question:

I have the following simple code:

import tensorflow as tf
import numpy as np

filename = # a list of wav filenames   
x = tf.placeholder(tf.string)

def mfcc(x):
    feature = # some function written in NumPy to convert a wav file to MFCC features
    return feature

mfcc_fn = lambda x: mfcc(x)

# create a training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x))
train_dataset = train_dataset.repeat()
train_dataset = train_dataset.map(mfcc_fn)
train_dataset = train_dataset.batch(100)
train_dataset = train_dataset.prefetch(buffer_size=1)

# create an iterator and iterate over training dataset
iterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)
train_iterator = iterator.make_initializer(train_dataset)

with tf.Session() as sess:
    sess.run(train_iterator, feed_dict={x: filename})

Basically, the code creates a tf.data.dataset object which loads a wav file and converts it to mfcc feature. Here, the data conversion happens at train_dataset.map(mfcc_fn) at which I apply an mfcc function written in NumPy to all input data.

Apparently, the code doesn’t work here because NumPy doesn’t support operations on tf.placeholder object. Is it possible map a function to input to tf.data.dataset if I have to write the function in NumPy? The reason I don’t use TensorFlow’s buit-in MFCC feature transformation is because the FFT function in TensorFlow gives significantly different output than its NumPy counterpart(as illustraded here), and the model I am building is prone to MFCC features generated using NumPy.

Asked By: Steven Chan

||

Answers:

You can achieve that with the tf.py_func function, or tf.py_function (which is the newer version). It does exactly what you want, it will wrap your numpy function that operates on arrays in a tensorflow operation that you can include as part of your dataset graph.

Answered By: Anis

You can use a python generator to handle the numpy array and then pass that to tf.data.Dataset.from_generator

For eg.

def sample_generator(image_paths):
    for image_path in image_paths:
        img = cv2.imread(image_path)
        # Do all the custom numpy things
    
        yield img

data_loader = tf.data.Dataset.from_generator(sample_generator,
                                             args=[image_paths],
                                             output_types=tf.int32,
                                             output_shapes=((None, None, 3))

This will create a TensorFlow data loader from the python generator. You can read more about this here.

Answered By: Prathamesh Dinkar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.