Why does adding duplicated features improve Logistic Regression accuracy?
Why does adding duplicated features improve Logistic Regression accuracy? Question: from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression X, y = load_iris(return_X_y=True) for i in range(5): X_redundant = np.c_[X,X[:,:i]] # repeating redundant features print(X_redundant.shape) clf = LogisticRegression(random_state=0,max_iter=1000).fit(X_redundant, y) print(clf.score(X_redundant, y)) Output (150, 4) 0.9733333333333334 (150, 5) 0.98 (150, 6) 0.98 (150, 7) 0.9866666666666667 (150, 8) …