What is "random-state" in sklearn.model_selection.train_test_split example?

Question:

Can someone explain me what random_state means in below example?

import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(10).reshape((5, 2)), range(5)


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42) 

Why is it hard coded to 42?

Asked By: Saurabh

||

Answers:

Isn’t that obvious? 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything.

On a serious note, random_state simply sets a seed to the random generator, so that your train-test splits are always deterministic. If you don’t set a seed, it is different each time.

Relevant documentation:

random_state : int, RandomState instance or None, optional
(default=None)

If int, random_state is the seed used by the random
number generator; If RandomState instance, random_state is the random
number generator; If None, the random number generator is the
RandomState instance used by np.random.

Answered By: cs95

Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.

Answered By: vumaasha

If you don’t specify the random_state in the code, then every time you run(execute) your code a new random value is generated and the train and test datasets would have different values each time.

However, if a fixed value is assigned like random_state = 0 or 1 or 42 or any other integer then no matter how many times you execute your code the result would be the same .i.e, same values in train and test datasets.

Answered By: Farzana Khan

When the Random_state is not defined in the code for every run train data will change and accuracy might change for every run.
When the Random_state = ” constant integer” is defined then train data will be constant For every run so that it will make easy to debug.

Answered By: kishore naidu

The random state is simply the lot number of the set generated randomly in any operation. We can specify this lot number whenever we want the same set again.

Answered By: OmkarKhilari