How to randomly split data in Python

Question

I need to create test and train from one set date. I have to split my datatset to create some linear regression. How to do it randomly ?

My Target variable: SalePrice
train = pd.read_csv(r'C:UserspkoniDesktoptrain.csv')
target = train['SalePrice']
X, y = train.data, train.target
train_X, test_X, train_y, test_y = train_test_split(X, y, 
                                                    train_size=0.5,
                                                    test_size=0.5,
                                                    random_state=123)

i dont know what i should add to X, y.

Asked By: Przemek Dabek

||

Source

Answer 1

Not sure I understand fully. If you are just trying to randomly split then this should work:

y = train['SalePrice']
X = train.drop('SalePrice', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.5,
                                                    random_state=0)

If you want to split all points after a certain date (e.g. 2010) to test and all points before to train then a different solution is needed.

test = train[train['Yr.Sold'] < 2010]
train = train[train['Yr.Sold'] > 2010]

Then after splitting test and train you can assign labels and features for each (see x,y in first code segment).

Answered By: mrw

How to randomly split data in Python

Question:

Answers: