Sklearn Linear Regression fit input order? Does exogenous variable go first?

Question:

The reference page says:

Parameters: 
X : array-like or sparse matrix, shape (n_samples, n_features)
Training data

y : array_like, shape (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary

Is X the exogenous variable? I would assume so but with statsmodel OLS the endogenous comes first so I want to confirm because they yield different coefficients.

Asked By: SpartanDawg

||

Answers:

Yes you are correct, the order in which you feed your exogenous and endogenous variables are reversed in sklearn module (true for other models in sklearn as well) when compared to the statsmodel OLS module.

If X = exogenous variable and Y = endogenous

In sklearn you would do something like this:

clf.fit(X,Y)

whereas, in statsmodel you would do:

clf.fit(Y,X)

Where clf is the model you are trying to build.