Creating a dataset from 2d matrices

Question:

I have a series of 2d matrices like these two:

matrix_1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix_2 = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])

And Each matrix has a label like:

labels = np.array([0, 1])

I want to make a dataset from these matrices to train my ML model later.
First I tried to make small .csv files for each matrix but we cannot train an ML model on multiple .csv files.

Then, I tried this code:

matrix_1_flat = matrix_1.flatten()
matrix_2_flat = matrix_2.flatten()

dataset = np.array([matrix_1_flat, matrix_2_flat])
dataset = np.transpose(dataset_1)

But I feel like that spatial information will be lost. Is there any other function apart from those I’m using to create what I want?

Actually by labels, I mean y variables in machine learning terms. In this example, matrix_1 and matrix_2 (two 2d matrices) are my x_train and the label of matrix_1 is 0 (or even cat if it makes it easier to understand) and the label of matrix_2 is 1 (or dog).

I want the train and its labels to be like this:

x_train = np.array([[[1, 2, 3],[4, 5, 6],[7, 8, 9]],[[10, 11, 12],[13, 14, 15],[16, 17, 18]]])  
y_train = y = np.array(["cat", "dog"])
Asked By: Totoro

||

Answers:

Your question is not that clear. What is your x and y variables for training? And what do you mean with labeling? If you mean that the labels are the y variables than a simple machine learning could be:

from sklearn.linear_model import LogisticRegression

x_train = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y_train = np.array([0, 1, 0])  # Labels for x_train
x_test = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])

model = LogisticRegression()
model.fit(x_train, y_train)
predictions = model.predict(x_test)

In this case we use x_train and y_train to train the model. Hence 7,8,9 will result in the label 0. Therefore this would not be a good training set for the test set as above. Since these values are not represented in the trainingset. But as the 7,8,9 values the closest to the test set, we get the labels: 0

print(predictions)
[0 0 0]

If this is not what you mean than you need to be more specific in your question

Answered By: Harmen Dijkstra

I guess you want to make a dataset such that each x-y pair (a matrix, and a label) have x in its original shape (to not loose spatial information, treating each matrix as image-like).

With the aid of numpy, you can create a compressed file representing the dataset as follows:

matrix_1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix_2 = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])

# preparing "x" and "y" - the dataset
matrices = [matrix_1, matrix_2]
labels = np.array([0, 1])

# save into an npz object: 
#  - it's dict-like, so we use "x" and "y" as keys
#  - this will be saved as "matrix_dataset.npz"
np.savez_compressed('matrix_dataset', x=matrices, y=labels)

The npz file can be later loaded into memory:

ds = np.load('matrix_dataset.npz')

You can access the "x" and "y" fields simply by their key:

# e.g. if you want to train your model, after loading
x_train = np.array(ds['x'])
y_train = np.array(ds['y'])

# your model fitting code...

Note that the shape of x_train is now (N, 3, 3) where N (in this case is 2) refers to the batch axis, so doing x_train[0] will retrieve the first 3×3 matrix.

Answered By: Luca Anzalone