How to convert each row in Pandas DF to 2D numpy array?

Question

I have a Pandas dataframe with 785 columns (784 columns and 1 column as a label).

I wanna use CNN to train a model on it, therfore I should convert each row if the dataframe to "image": (28x28x1) (28×28=784), so a new dataframe will contain only 2 columns.

How can I can it?

The idead is on the picture:

Asked By: tiber septim

||

Source

Answer 1

Following code is an example of 4 columns and a label.

import numpy as np
import pandas as pd

csv_path = './example.xlsx'

df = pd.read_excel(csv_path)
df_new = pd.DataFrame(columns=['Column', 'label'])

for i, row in df.iterrows():
    a = []
    for j, column in row.iteritems():
        if j == 'label':
            label = column
            break
        a.append(column)
    img = np.array(a).reshape(2, 2)
    label = column
    df_new.loc[i] = [img, label]

df_new.to_csv('result.csv', index=False)

Answered By: x pie

Answer 2

X_train = X_train.values.reshape(-1,28,28,1)

Answered By: tiber septim

Answer 3

If you look for something like [[1,2,3,4]], you can use the code below. If you need the array in the format of [[1,2], [3,4]], then change the reshape method as np.reshape(arr, (2, 2)). df is the original dataframe with 784 feature and label columns.

First, get all columns except the label:

columns = df.columns.to_list()
columns.remove('label')

Then, use the apply method to convert the df to the format you need:

def arrayize(df):
    arr = df.to_numpy()
    arr = arr.astype(int)
    arr = np.reshape(arr, (-1, 4))
    
    return arr

df["numpy_col"] = df[columns].apply(arrayize, axis=1)

I haven’t run the code to test it, so you may need to check it.

Answered By: hsaltan

How to convert each row in Pandas DF to 2D numpy array?

Question:

Answers: