Separate last column from the actual dataset using numpy

Question:

I have a dataset in csv format (without headers) where I want to split it into two parts: (1) The actual dataset without the last column, (2) the last column (class label). My dataset has 100K rows and 65 features ( where the last column, column 65, is the class label that I want to separate). I wrote the following:

dataset_path = 'dataset.csv'

dataset = np.genfromtxt(dataset_path, delimiter=',')
class_label = dataset[:-1]
dataset.drop(class_label, axis=1, inplace=True)

print dataset.shape
print class_label

This is in fact wrong. I am not able to achieve what I want. Any help is appreciated.

Asked By: Medo

||

Answers:

In case you are interested in using numpy arrays, you can read your data in the csv file into a numpy array:

 from numpy import genfromtxt
 my_data = genfromtxt('E:Book1.csv', delimiter=',', dtype = 'str',  skip_header=1, unpack=True)

each item in my_data will be a list of each column in your csv file.
Now you can remove the last column by:

 my_data_without_last_column = my_data[:-1].copy()
Answered By: Behzad Jamali

assuming that your dataset is without header

class_label = dataset[:, -1] # for last column
dataset = dataset[:, :-1] # for all but last column
Answered By: ahed87

using [-1] and class_label will still be columns

class_label = dataset[:, [-1]]
Answered By: Phạm Đức Tài
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.