I need to append items from one list to two others, but my database is squared when i run it through

Question:

train_imgs = []
train_lbs = []
test_imgs = []
test_lbs = []

i = 0
j = 0
print(len(array_resized))
for i in range(len(array_resized)):
  for j in range(24):
    train_imgs.append(array_resized[i])
    train_lbs.append(ar_label[i])
    j = j+1
  j = 0
  for j in range(6):
    test_imgs.append(array_resized[i])
    test_lbs.append(ar_label[i])
    j = j+1
  j = 0

I need to send 24 items to the train list, and then the next 6 to the test list, and then the next 24 to the train list and so on and so on, but the lenght of the final test and train lists sums up to 9000, instead of the initial 300. What can I do about it? Thank you in advance!

array_0 = [cv2.imread(file, cv2.IMREAD_GRAYSCALE) for file in glob.glob("/content/imgs_3/0*")]
ar_label0 = ['0' for file in array_0]
array_1 = [cv2.imread(file, cv2.IMREAD_GRAYSCALE) for file in glob.glob("/content/imgs_3/1*")]
ar_label1 = ['1' for file in array_1]

and so on.. until array9

array_all = array_0+array_1+array_2+array_3+array_4+array_5+array_6+array_7+array_8+array_9

array_resized = [cv2.resize(file, (28,28), interpolation=cv2.INTER_LINEAR) for file in array_all]
Asked By: charles

||

Answers:

Here’s how I’d go about parsing up your list:

for i in range(0, len(array_resized), 30):
  for j in range(24):
    train_images.append(array_resized[i+j])
    train_images.append(array_resized[i+j])
  for j in range(6):
    test_images.append(array_resized[i+24+j])
    test_images.append(array_resized[i+24+j])

In your code you essentially added 24 instances of each term of the original list to the train list, and 6 to the test list. This is because you loop through the entire length of the original list. In my code I looped through chunks of 30, each consisting of 24 train elements and 6 test elements.

Note: In python, for loops are much more intuitive than other languages. Python abides by the ideas of [Functional Programming][1], which generally makes life a lot easier.

W3Schools has a pretty good lesson on for loops, and all their little quirks.

link: https://www.w3schools.com/python/python_for_loops.asp

[1]: https://medium.com/javascript-scene/master-the-javascript-interview-what-is-functional-programming-7f218c68b3a0#:~:text=Functional%20programming%20(often%20abbreviated%20FP,state%20flows%20through%20pure%20functions.

Answered By: Marco Kurepa

I think you’re overcomplicating the entire process.

From what I see you want to split your data into a training and test set (although you should split it into a training, validation and test set). You want 80% training images and 20% test images.

I would take array_resized, shuffle it (so you have a random split) and then choose the first 80% for training and the remaining 20% for testing:

import random

# shuffle list
random.shuffle(array_resized)

# get the split
length = len(array_resized)
train_split = int(length * 0.8)  # 80% percent of the data

train_imgs = array_resized[:train_split]  # choose the first 80% of the images
test_imgs = array_resized[train_split:]  # choose the remaining images
Answered By: koegl
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.