How Do I separate labels and images?

Question:

I am loading a dataset of handwritten images

import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers 

train_data= np.loadtxt('train.txt')
print('train:',train_data.shape)      ##train (7291, 257)

The first digit of each row is a digit from 0-9(labels), and the rest 256 are images. How can I separate these labels from the images? What I am thinking is to make a new tensor with every first digit of each row, and another one with the rest of the digits. Since I am a beginner I am not sure how to do it or if my approach is correct.

Asked By: comms

||

Answers:

You need to learn numpy indexing: https://numpy.org.cn/en/user/basics/indexing.html

In your case, just do

labels = train_data[:, 0]
images = train_data[:, 1:] 
Answered By: nnzzll

The first digit is label. i.e The first column is label, so if you see it like this

col1 | col2 | .... | coln
label| ................
label| ..................

Now you want to separate labels from the rest, so you want 1st column. To do so in Numpy, you need to index. The syntax is simple,

train_data= np.loadtxt('train.txt')

y_train = train_data[:, 0] # All Rows, col 0 (0 is 1st col since indexing starts from 0)
# y_train is commonly referred as training labels

x_train = train_data[:, 1:] # All rows, col 1 included and onwards.

Hope this is clear.

Answered By: Ahmad Anis