why I must reshape one image to [n,height,width,channel] in CNN

Question:

I try to apply a convolutional layer to a picture of shape [256,256,3]
a have an error when I user the tensor of the image directly

conv1 = conv2d(input,W_conv1) +b_conv1  #<=== error 

error message:

ValueError: Shape must be rank 4 but is rank 3 for 'Conv2D' (op: 'Conv2D') 
with input shapes: [256,256,3], [3,3,3,1].    

but when I reshape the function conv2d work normally

x_image = tf.reshape(input,[-1,256,256,3])
conv1 = conv2d(x_image,W_conv1) +b_conv1

if I must reshape the tensor what the best value to reshape in my case and why?

import tensorflow as tf
import numpy as np
from PIL import Image

def img_to_tensor(img) :
    return tf.convert_to_tensor(img, np.float32)

def weight_generater(shape):
    return tf.Variable(tf.truncated_normal(shape,stddev=0.1))

def bias_generater(shape):
    return tf.Variable(tf.constant(.1,shape=shape))

def conv2d(x,W):
    return tf.nn.conv2d(x,W,[1,1,1,1],'SAME')

def pool_max_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,1,1,1],padding='SAME')

#read image
img = Image.open("img.tif")

sess = tf.InteractiveSession()

#convetir image to tensor
input = img_to_tensor(img).eval()
#print(input)

# get img dimension
img_dimension = tf.shape(input).eval()
print(img_dimension)

height,width,channel=img_dimension
filter_size = 3
feature_map = 32

x = tf.placeholder(tf.float32,shape=[height*width*channel])
y = tf.placeholder(tf.float32,shape=21)

# generate weigh [kernal size, kernal size,channel,number of filters]
W_conv1 = weight_generater([filter_size,filter_size,channel,1])

#for each filter W has his  specific bais
b_conv1 = bias_generater([feature_map])

""" I must reshape the picture
x_image = tf.reshape(input,[-1,256,256,3])
"""
conv1 = conv2d(input,W_conv1) +b_conv1  #<=== error

h_conv1 = tf.nn.relu(conv1)

h_pool1 = pool_max_2x2(h_conv1)

layer1_dimension = tf.shape(h_pool1).eval()

print(layer1_dimension)
Asked By: Sakhri Houssem

||

Answers:

The first dimension is the batch size. If you are feeding 1 image at a time you can simply make the first dimension 1 and it doesn’t change your data any, just changes the indexing to 4D:

x_image = tf.reshape(input, [1, 256, 256, 3])

If you reshape it with a -1 in the first dimension what you are doing is saying that you will feed in a 4D batch of images (shaped [batch_size, height, width, color_channels], and you are allowing the batch size to be dynamic (which is common to do).

Answered By: David Parks

You could also use

im = tf.expand_dims(input, axis=0)

to insert a dimension of 1 into the tensor’s shape. im will be a rank 4 tensor. This way you do not have to specify the dimensions of the image.

Answered By: diophantus7