What type of input does ResNet need?

Question:

I am new to deep learning, and I am trying to train a ResNet50 model to classify 3 different surgical tools. The problem is that every article I read tells me that I need to use 224 X 224 images to train ResNet, but the images I have are of size 512 X 288.

So my questions are:

Is it possible to use 512 X 288 images to train ResNet without cropping the images? I do not want to crop the image because the tools are positioned rather randomly inside the image, and I think cropping the image will cut off part of the tools as well.
For the training and test set images, do I need to draw a rectangle around the object I want to classify?
Is it okay if multiple different objects are in one image? The data set I am using often has multiple tools appearing in one image, and I wonder if I must only use images that only have one tool appearing at a time.
If I were to crop the images to fit one tool, will it be okay even if the sizes of the images vary?

Asked By: Sean

Source

Answers:

Is it possible to use 512 X 288 images to train ResNet without cropping the images? I do not want to crop the image because the tools
are positioned rather randomly inside the image, and I think cropping
the image will cut off part of the tools as well.

Yes you can train ResNet without cropping your images. you can resize them, or if that’s not possible for some reason, you can alter the network, e.g. add a global pooling at the very end and account for the different input sizes. (you might need to change kernel sizes, or downsampling rate).
If your bigest issue here is that resnet requires 224x224 while your images are of size 512x228, the simplest solution would be to first resize them into 224x224. only if that`s not a possibility for you for some technical reasons, then create a fully convolutional network by adding a global pooling at the end.(I guess ResNet does have a GP at the end, in case it does not, you can add it.)

For the training and test set images, do I need to draw a rectangle around the object I want to classify?

For classification no, you do not. having a bounding box for an object is only needed if you want to do detection (that’s when you want your model to also draw a rectangle around the objects of interest.)

Is it okay if multiple different objects are in one image? The data set I am using often has multiple tools appearing in one image, and I
wonder if I must only use images that only have one tool appearing at
a time.

3.Its ok to have multiple different objects in one image, as long as they do not belong to different classes that you are training against. That is, if you are trying to classify apples vs oranges, its obvious that, an image can not contain both of them at the same time. but if for example it contains anything else, a screwdriver, key, person, cucumber, etc, its fine.

If I were to crop the images to fit one tool, will it be okay even if the sizes of the images vary?

It depends on your model. cropping and image size are two different things. you can crop an image of any size, and yet resize it to your desired dimensions. you usually want to have all images with the same size, as it makes your life easier, but its not a hard condition and based on your requirements you can have varying images, etc as well.

Answered By: Hossein