If I split a data set into two parts, build predictor on one set and validate with the second, I'm confirming that some property is independent of the type of data split I'm performing. For example, if I train on 2009 data and verify with 2010 data, it leads to the inductive hypothesis that what I found is independent of time, and therefore useful for forecasting. So far so good. But if instead I use a random number to split the data, all I'm confirming is that the property is independent of a random number generator, which we would expect to be the case without doing the experiment.
[link][16 comments]