Trying to apply fit_transofrm() function from sklearn.compose.ColumnTransformer class on array but getting "tuple index out of range" error

Question:

I am beginner in ML/AI and trying to do pre-proccesing on my dataset of digits that I’ve made myself. I want to apply OneHotEncoding on my categorical variable (which is a dependent one,idk if it is important) but getting "tuple index out of range" error. I was searching on the internet and the only solution was to use reshape() function but it didn’t help or may be i am not using it correctly.

Here is my dataset,first 28 columns are decoded digits into 1s and 0s and the last column contains digits itself

Here is my code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf


#Data Preprocessing
dataset = pd.read_csv('dataset_cisla_polia2.csv',header = None,sep = ';')
X = dataset.iloc[:, 0:28].values
y = dataset.iloc[:, 29].values
print(X)
print(y)
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[29])],remainder = 'passthrough')
y = np.array(ct.fit_transform(y))

I am expecting to get variable y to be like this:
digit 1 is encoded that way = [1 0 0 0 0 0 0 0 0 0 ],
digit 2 is encoded that way = [0 1 0 0 0 0 0 0 0 0 ]
and so on..

Asked By: kurkurindd

||

Answers:

This is because ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[29])],remainder = 'passthrough') will one-hot encode the column of index 29.

You are fit-transforming y which only has 1 column. You can change the 29 to 0.

ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[0])],remainder = 'passthrough')

Edit

You also need to change the iloc to keep the numpy array as column structure.

y = dataset.iloc[:, [29]].values
Answered By: wavetitan