How to convert each row in Pandas DF to 2D numpy array?
Question:
I have a Pandas dataframe with 785
columns (784
columns and 1
column as a label).
I wanna use CNN to train a model on it, therfore I should convert each row if the dataframe to "image": (28x28x1)
(28×28=784), so a new dataframe will contain only 2 columns.
How can I can it?
Answers:
Following code is an example of 4 columns and a label.
import numpy as np
import pandas as pd
csv_path = './example.xlsx'
df = pd.read_excel(csv_path)
df_new = pd.DataFrame(columns=['Column', 'label'])
for i, row in df.iterrows():
a = []
for j, column in row.iteritems():
if j == 'label':
label = column
break
a.append(column)
img = np.array(a).reshape(2, 2)
label = column
df_new.loc[i] = [img, label]
df_new.to_csv('result.csv', index=False)
X_train = X_train.values.reshape(-1,28,28,1)
If you look for something like [[1,2,3,4]]
, you can use the code below. If you need the array in the format of [[1,2], [3,4]]
, then change the reshape method as np.reshape(arr, (2, 2))
. df
is the original dataframe with 784 feature and label columns.
First, get all columns except the label:
columns = df.columns.to_list()
columns.remove('label')
Then, use the apply method to convert the df
to the format you need:
def arrayize(df):
arr = df.to_numpy()
arr = arr.astype(int)
arr = np.reshape(arr, (-1, 4))
return arr
df["numpy_col"] = df[columns].apply(arrayize, axis=1)
I haven’t run the code to test it, so you may need to check it.
I have a Pandas dataframe with 785
columns (784
columns and 1
column as a label).
I wanna use CNN to train a model on it, therfore I should convert each row if the dataframe to "image": (28x28x1)
(28×28=784), so a new dataframe will contain only 2 columns.
How can I can it?
Following code is an example of 4 columns and a label.
import numpy as np
import pandas as pd
csv_path = './example.xlsx'
df = pd.read_excel(csv_path)
df_new = pd.DataFrame(columns=['Column', 'label'])
for i, row in df.iterrows():
a = []
for j, column in row.iteritems():
if j == 'label':
label = column
break
a.append(column)
img = np.array(a).reshape(2, 2)
label = column
df_new.loc[i] = [img, label]
df_new.to_csv('result.csv', index=False)
X_train = X_train.values.reshape(-1,28,28,1)
If you look for something like [[1,2,3,4]]
, you can use the code below. If you need the array in the format of [[1,2], [3,4]]
, then change the reshape method as np.reshape(arr, (2, 2))
. df
is the original dataframe with 784 feature and label columns.
First, get all columns except the label:
columns = df.columns.to_list()
columns.remove('label')
Then, use the apply method to convert the df
to the format you need:
def arrayize(df):
arr = df.to_numpy()
arr = arr.astype(int)
arr = np.reshape(arr, (-1, 4))
return arr
df["numpy_col"] = df[columns].apply(arrayize, axis=1)
I haven’t run the code to test it, so you may need to check it.