My minimization routine takes a very long time. Specifically, if I allow it to run until it finds a local minimum to within machine precision, it can require thousands of iterations. (Not surprisingly, the marginal improvement on each iteration falls steadily.) Since I have limited CPU-hours to work with, how close to a local minimum should I get?
This question becomes especially relevant when I repeatedly retrain my neural network while iterating various parameters (regularization; number of features; network topology; size of dataset). I'd like to determine good choices for each of these parameters. Will halting minimization early interfere with that?
(Related: I'm using Octave and have implemented fmincg.m as created by the folks running the Stanford ML Class. Suggestions for better minimization algorithms, particularly those with more documentation or with an adjustable learning rate, are welcome.)
[link] [11 comments]