How to pass different set of data to train and test without splitting a dataframe. (python)?

Question:

I have gone through multiple questions that help divide your dataframe into train and test, with scikit, without etc.

But my question is I have 2 different csvs ( 2 different dataframes from different years). I want to use one as train and other as test?

How to do so for LinearRegression / any model?

Asked By: Viv

||

Answers:

  • Load the datasets individually.
  • Make sure they are in the same format of rows and columns (features).
  • Use the train set to fit the model.
  • Use the test set to predict the output after training.
# Load the data
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

# Split features and value
# when trying to predict column "target" 
X_train, y_train = train.drop("target"), train["target"]
X_test, y_test = test.drop("target"), test["target"]

# Fit (i.e. train) model
reg = LinearRegression()
reg.fit(X_train, y_train)

# Predict
pred = reg.predict(X_test)

# Score
accuracy = reg.score(X_test, y_test)
Answered By: skillsmuggler

please skillsmuggler what about the X_train and X_Test how I can define it because when I try to do that it said NameError: name ‘X_train’ is not defined

Answered By: mohammed lafatih

I couldn’t edit the first answer which is almost there. There is some code missing though…

# Load the data
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

y_train = train[:, :1]  #if y is only one column
X_train = train[:, 1:]

# Fit (train) model
reg = LinearRegression()
reg.fit(X_train, y_train)

# Predict
pred = reg.predict(X_test)

# Score
accuracy = reg.socre(X_test, y_test)
Answered By: MattiH