Keras2 ImageDataGenerator or TensorFlow tf.data?
Question:
With Keras2 being implemented into TensorFlow and TensorFlow 2.0 on the horizon, should you use Keras ImageDataGenerator
with e.g, flow_from_directory
or tf.data
from TensorFlow which also can be used with fit_genearator
of Keras now?
Will both methods will have their place by serving a different purpose or will tf.data
be the new way to go and Keras generators deprecated in the future?
Thanks, I would like to take the path which keeps me up to date a bit longer in this fast moving field.
Answers:
Since its release, TensorFlow Dataset API is a default recommended way to construct input pipeline for any model build on TensorFlow backend, both Keras and low-level TensorFlow.
In later versions of TF 1.xx it can be directly used in tf.keras.Model.fit
method as
model.fit(dataset, epochs)
It’s good both for rapid prototyping,
dataset = tf.data.Dataset.from_tensor_slices((train, test))
dataset = dataset.shuffle().repeat().batch()
And for building complex, high performance ETL pipelines
4. Upgrade your data input pipelines, more on this here https://www.tensorflow.org/guide/performance/datasets
As per official docs, in TF 2.0 it’ll also be the default way to input data to the model. https://www.tensorflow.org/alpha/guide/migration_guide
As by default, upcoming TensorFlow version will be executed eagerly, dataset object will become iterable and will be even easier to use.
For me, I prefer to build a generator with yield
:
def generator(batch_size=4,path):
imgs=glob(path+'*.jpg')
while True:
batch=[]
for i in range(batch_size):
idx=np.random.randint(0,len(imgs))
img=cv.resize(cv.imread(imgs[idx]),(256,256))/255
batch.append(img)
batch=np.array(batch)
yield batch
Then create the generator and input it to model.fit_generator
, it will work.
You can choose data randomly like this or use some recurrent methods.
Though the code is rough, it is easy to change so that it can generate complex batch.
Note that this is a way to generate for TF 1.X with Keras2 and not with TensorFlow 2.0.
Alongside custom defined Python generators, you can wrap the ImageDataGenerator
from Keras inside tf.data
.
The following snippets are taken from the TensorFlow 2.0 documentation.
img_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, rotation_range=20)
ds = tf.data.Dataset.from_generator(
img_gen.flow_from_directory, args=[flowers],
output_types=(tf.float32, tf.float32),
output_shapes = ([32,256,256,3],[32,5])
)
Therefore, one can still use the typical Keras ImageDataGenerator
, you just need to wrap it into a tf.data.Dataset
like above.
Update 2022
On visiting the ImageDataGenerator
documentation, there is now a deprecation message that says the following:
Deprecated: tf.keras.preprocessing.image.ImageDataGenerator
is not recommended for new code. Prefer loading images with tf.keras.utils.image_dataset_from_directory
and transforming the output tf.data.Dataset
with preprocessing layers. For more information, see the tutorials for loading images and augmenting images, as well as the preprocessing layer guide.
With Keras2 being implemented into TensorFlow and TensorFlow 2.0 on the horizon, should you use Keras ImageDataGenerator
with e.g, flow_from_directory
or tf.data
from TensorFlow which also can be used with fit_genearator
of Keras now?
Will both methods will have their place by serving a different purpose or will tf.data
be the new way to go and Keras generators deprecated in the future?
Thanks, I would like to take the path which keeps me up to date a bit longer in this fast moving field.
Since its release, TensorFlow Dataset API is a default recommended way to construct input pipeline for any model build on TensorFlow backend, both Keras and low-level TensorFlow.
In later versions of TF 1.xx it can be directly used in tf.keras.Model.fit
method as
model.fit(dataset, epochs)
It’s good both for rapid prototyping,
dataset = tf.data.Dataset.from_tensor_slices((train, test))
dataset = dataset.shuffle().repeat().batch()
And for building complex, high performance ETL pipelines
4. Upgrade your data input pipelines, more on this here https://www.tensorflow.org/guide/performance/datasets
As per official docs, in TF 2.0 it’ll also be the default way to input data to the model. https://www.tensorflow.org/alpha/guide/migration_guide
As by default, upcoming TensorFlow version will be executed eagerly, dataset object will become iterable and will be even easier to use.
For me, I prefer to build a generator with yield
:
def generator(batch_size=4,path):
imgs=glob(path+'*.jpg')
while True:
batch=[]
for i in range(batch_size):
idx=np.random.randint(0,len(imgs))
img=cv.resize(cv.imread(imgs[idx]),(256,256))/255
batch.append(img)
batch=np.array(batch)
yield batch
Then create the generator and input it to model.fit_generator
, it will work.
You can choose data randomly like this or use some recurrent methods.
Though the code is rough, it is easy to change so that it can generate complex batch.
Note that this is a way to generate for TF 1.X with Keras2 and not with TensorFlow 2.0.
Alongside custom defined Python generators, you can wrap the ImageDataGenerator
from Keras inside tf.data
.
The following snippets are taken from the TensorFlow 2.0 documentation.
img_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, rotation_range=20)
ds = tf.data.Dataset.from_generator(
img_gen.flow_from_directory, args=[flowers],
output_types=(tf.float32, tf.float32),
output_shapes = ([32,256,256,3],[32,5])
)
Therefore, one can still use the typical Keras ImageDataGenerator
, you just need to wrap it into a tf.data.Dataset
like above.
Update 2022
On visiting the ImageDataGenerator
documentation, there is now a deprecation message that says the following:
Deprecated:
tf.keras.preprocessing.image.ImageDataGenerator
is not recommended for new code. Prefer loading images withtf.keras.utils.image_dataset_from_directory
and transforming the outputtf.data.Dataset
with preprocessing layers. For more information, see the tutorials for loading images and augmenting images, as well as the preprocessing layer guide.