Extracting separate images from YOLO bounding box coordinates

Question

I have a set of images and their corresponding YOLO coordinates. Now I want to extract the objects that these YOLO coordinates denote into separate images.

But these coordinates are in floating point notation and hence am not able to use splicing.

This is an image Sample Image and the corresponding YOLO coordinates are

labels = [0.536328, 0.5, 0.349219, 0.611111]

I read my image as follows :

image = cv2.imread('frame0.jpg')

Then I wanted to use something like image[y:y+h,x:x+w] as I had seen in a similar question. But the variables are float, so I tried to convert them into integers using the dimensions of the image 1280 x 720 like this :

object = [int(label[0]*720), int(label[1]*720), int(label[2]*1280), int(label[3]*1280)]
x,y,w,h = object

But it doesn’t get the part of the image correctly as you can see over here extractedImage

This is part of my training dataset, so I had cropped these parts earlier using some tools, so there would not be any errors in my labels. Also all the images are incorrectlly cropped this way, I have shown the output for 1 of the images.

Thanks a lot in advance. Any suggestions would be really helpful !

Asked By: Suraj

||

Source

Answer 1

The labels need to be normalized differently – since the x and y are with respect to the center of the screen, they’re actually multiplied by W/2 and H/2, respectively. Also, the width and height dimensions have to be multiplied by W and H, respectively – they’re currently both being normalized by the W (1280). Here’s how I solved it:

import cv2
import matplotlib.pyplot as plt

label = [0.536328, 0.5, 0.349219, 0.611111]
img = cv2.imread('P6A4J.jpg')

H, W, _ = img.shape
object = [int(label[0]*W/2), int(label[1]*H/2), int(label[2]*W), int(label[3]*H)]

x,y,w,h = object
plt.subplot(1,2,1)
plt.imshow(img)
plt.subplot(1,2,2)
plt.imshow(img[y:y+h, x:x+w])
plt.show()


plt.show()

Output:

enter image description here ]1