Splitting image based dataset for YOLOv3

Question:

I have a question about splitting a dataset of 20k images along with their labels, the dataset is in the format of YOLOv3 which has an image file and a .txt file with the same name as the image, the text file has the labels inside it.

I want to split the dataset into train/test splits, is there a way to randomly select the image and its labels .txt file with it and store it in a separate folder using Python?

I want to be able to split the dataset randomly. For instance, select 16k files along with label file too and store them separately in a train folder and the remaining 4k should be stored in a test folder.

This could manually be done in the file explorer by selecting the first 16k files and move them to a different folder but the split won’t be random as I plan to do this over and over again for the same dataset.

Here is what the data looks like
images and labels screenshot

Asked By: Muaz Shahid

||

Answers:

I suggest you to take a look at following Python built-in modules

for manipulating files and paths in Python. Here is my code with comments that might solve your problem. It’s very simple

import glob
import random
import os
import shutil

# Get all paths to your images files and text files
PATH = 'path/to/dataset/'
img_paths = glob.glob(PATH+'*.jpg')
txt_paths = glob.glob(PATH+'*.txt')

# Calculate number of files for training, validation
data_size = len(img_paths)
r = 0.8
train_size = int(data_size * 0.8)

# Shuffle two list
img_txt = list(zip(img_paths, txt_paths))
random.seed(43)
random.shuffle(img_txt)
img_paths, txt_paths = zip(*img_txt)

# Now split them
train_img_paths = img_paths[:train_size]
train_txt_paths = txt_paths[:train_size]

valid_img_paths = img_paths[train_size:]
valid_txt_paths = txt_paths[train_size:]

# Move them to train, valid folders
train_folder = PATH+'train/' 
valid_folder = PATH+'valid/'
os.mkdir(train_folder)
os.mkdir(valid_folder)

def move(paths, folder):
    for p in paths:
        shutil.move(p, folder)

move(train_img_paths, train_folder)
move(train_txt_paths, train_folder)
move(valid_img_paths, valid_folder)
move(valid_txt_paths, valid_folder)

Answered By: Đinh Anh Vũ

here is my repo
https://github.com/akashAD98/Train_val_Test_split

you will get train val & test txt which has all path

Answered By: Akash Desai