I'm searching hyperparameters using TPE methods on a small dataset. The loss function for the hyperparameter search is the mean of my accuracy metric over the K-Folded training set.
The standard deviation of the accuracy metric for a given parameter is high, no matter how I cross-validate or how much I slide the percentage of the training data used in fitting the algorithm.
I'm okay with the noise, there is bootstrap-reproducible accuracy for some hyperparameter combinations. In fact the standard deviation is relatively stable as well.
My natural inclination is to define loss
loss=mean - norm.ppf(.99)*std
But then I'm reminded that the current generation of contextual bandits work, in a very similar setting by chasing the upper confidence bound of some model parameterized by "experts" or function approximation.
Should I ignore the standard deviation altogether?
[link][4 comments]