Hi,
I have the following question. I have ~100000 labeled data and a division on 70% training and 30% test data.
Am I allowed to create n (say 50) bootstrap training and test data sets and use the mean and variance between the estimates on the 50 test test set as my estimate of the mean and the variance as an error estimate ? given that I've taken care that none of the 50 bootstrap samples was over-trained
Will the mean of the 50 be too optimistic ?
[link][1 comment]