How to randomly split data in Python

Question:

I need to create test and train from one set date. I have to split my datatset to create some linear regression. How to do it randomly ?

My Target variable: SalePrice
train = pd.read_csv(r'C:UserspkoniDesktoptrain.csv')
target = train['SalePrice']
X, y = train.data, train.target
train_X, test_X, train_y, test_y = train_test_split(X, y, 
                                                    train_size=0.5,
                                                    test_size=0.5,
                                                    random_state=123)

i dont know what i should add to X, y.

enter image description here

Asked By: Przemek Dabek

||

Answers:

Not sure I understand fully. If you are just trying to randomly split then this should work:

y = train['SalePrice']
X = train.drop('SalePrice', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.5,
                                                    random_state=0) 

If you want to split all points after a certain date (e.g. 2010) to test and all points before to train then a different solution is needed.

test = train[train['Yr.Sold'] < 2010]
train = train[train['Yr.Sold'] > 2010]

Then after splitting test and train you can assign labels and features for each (see x,y in first code segment).

Answered By: mrw
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.