Quantcast
Viewing all articles
Browse latest Browse all 63407

Resilient prop for SGD?

Hi all. I have been working with large nns lately and have been using batch stochastic gradient descent to train them online using live data with fixed a learning rate.

I am seeing fairly slow convergence rates which is consistent with my experience with fixed learning rate learning. In my experience, I have seen hieruistic methods like iRPROP work much better since they optimize the learning rate on a parameter-by-parameter basis. I have found they converge significantly faster.

I am wondering if anyone has heard of using resilient propagation used in conjunction with batch SGD?

I would imagine running irprop for a while on the batch, then incorporating like:

g1 = ((1-lr) * g0 + (lr * g_irporp))

where g1 is the updated gradient, g0 is the orig. gradient, and g_irprop is the gradient obtained after some amount of irprop.

lr is the "learning rate" which will get implicitly scaled by the per-variable parameters learned using resilient propagation... So this seems "smarter" than pure SGD, but would have some issues perhaps because irprop is being run on fairly small batches.

Thoughts?

submitted by GratefulTony
[link][3 comments]

Viewing all articles
Browse latest Browse all 63407