Does the preprocessing of one algorithm change the conditions of the experiment?


As an example,

We have two algorithms that utilize the same dataset and the same train and test data:

1 – uses k-NN and returns the accuracy;

2 -applies preprocessing before k-NN and adds a few more things, before returning the accuracy.

Although the preprocessing "is a part of" algorithm number 2, I’ve been told that we cannot compare these two methods because the experiment’s conditions have changed as a result of the preprocessing.
Given that the preprocessing is only exclusive to algorithm no. 2, I believe that the circumstances have not been altered.

Which statement is the correct one?

Asked By: melson



It depends what you are comparing.

  • if you compare the two methods "with preprocessing allowed", then you don’t include the preprocessing in the experiment; and in principle you should test several (identical) queries;

  • if you compare "with no preprocessing allowed", then include everything in the measurement.

Answered By: Yves Daoust