preprocess_input() method in keras

Question:

I am trying out sample keras code from the below keras documentation page,
https://keras.io/applications/

What preprocess_input(x) function of keras module does in the below code? Why do we have to do expand_dims(x, axis=0) before that is passed to the preprocess_input() method?

from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input
import numpy as np

model = ResNet50(weights='imagenet')

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

Is there any documentation with a good explanation of these functions?

Thanks!

Answers:

Keras works with batches of images. So, the first dimension is used for the number of samples (or images) you have.

When you load a single image, you get the shape of one image, which is (size1,size2,channels).

In order to create a batch of images, you need an additional dimension: (samples, size1,size2,channels)

The preprocess_input function is meant to adequate your image to the format the model requires.

Some models use images with values ranging from 0 to 1. Others from -1 to +1. Others use the “caffe” style, that is not normalized, but is centered.

From the source code, Resnet is using the caffe style.

You don’t need to worry about the internal details of preprocess_input. But ideally, you should load images with the keras functions for that (so you guarantee that the images you load are compatible with preprocess_input).

Answered By: Daniel Möller

This loads an image and resizes the image to (224, 224):

 img = image.load_img(img_path, target_size=(224, 224))

The img_to_array() function adds channels: x.shape = (224, 224, 3) for RGB and (224, 224, 1) for gray image

 x = image.img_to_array(img) 

expand_dims() is used to add the number of images: x.shape = (1, 224, 224, 3):

x = np.expand_dims(x, axis=0)

preprocess_input subtracts the mean RGB channels of the imagenet dataset. This is because the model you are using has been trained on a different dataset: x.shape is still (1, 224, 224, 3)

x = preprocess_input(x)

If you add x to an array images, at the end of the loop, you need to add images = np.vstack(images) so that you get (n, 224, 224, 3) as the dim of images where n is the number of images processed

Answered By: DK250

I found that preprocessing your data while yours is a too different dataset vs the pre_trained model/dataset, then it may harm your accuracy somehow. If you do transfer learning and freezing some layers from a pre_trained model/their weights, simply /255.0 your original dataset does the job just fine, at least for large 1/2 millions samples food dataset. Ideally you should know your std/mean of you dataset and use it instead of using std/mdean of the pre-trained model preprocess.

Answered By: ThÆ° Sinh

As you can see there tensorflow/python/keras/_impl/keras/applications/imagenet_utils.py main purpose of preprocessing for torch is normalizing the color channels accordingly which dataset used the train the networks before. Like we do by simply (Data – Mean) / Std

Source code:

def _preprocess_numpy_input(x, data_format, mode):
  """Preprocesses a Numpy array encoding a batch of images.
  Arguments:
      x: Input array, 3D or 4D.
      data_format: Data format of the image array.
      mode: One of "caffe", "tf" or "torch".
          - caffe: will convert the images from RGB to BGR,
              then will zero-center each color channel with
              respect to the ImageNet dataset,
              without scaling.
          - tf: will scale pixels between -1 and 1,
              sample-wise.
          - torch: will scale pixels between 0 and 1 and then
              will normalize each channel with respect to the
              ImageNet dataset.
  Returns:
      Preprocessed Numpy array.
  """
  if mode == 'tf':
    x /= 127.5
    x -= 1.
    return x

  if mode == 'torch':
    x /= 255.
    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]
  else:
    if data_format == 'channels_first':
      # 'RGB'->'BGR'
      if x.ndim == 3:
        x = x[::-1, ...]
      else:
        x = x[:, ::-1, ...]
    else:
      # 'RGB'->'BGR'
      x = x[..., ::-1]
    mean = [103.939, 116.779, 123.68]
    std = None

  # Zero-center by mean pixel
  if data_format == 'channels_first':
    if x.ndim == 3:
      x[0, :, :] -= mean[0]
      x[1, :, :] -= mean[1]
      x[2, :, :] -= mean[2]
      if std is not None:
        x[0, :, :] /= std[0]
        x[1, :, :] /= std[1]
        x[2, :, :] /= std[2]
    else:
      x[:, 0, :, :] -= mean[0]
      x[:, 1, :, :] -= mean[1]
      x[:, 2, :, :] -= mean[2]
      if std is not None:
        x[:, 0, :, :] /= std[0]
        x[:, 1, :, :] /= std[1]
        x[:, 2, :, :] /= std[2]
  else:
    x[..., 0] -= mean[0]
    x[..., 1] -= mean[1]
    x[..., 2] -= mean[2]
    if std is not None:
      x[..., 0] /= std[0]
      x[..., 1] /= std[1]
      x[..., 2] /= std[2]
  return x
Answered By: bitbang
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.