Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 63752

What should be a generic enough convergence criteria of Stochastic Gradient Descent.

$
0
0

I am implementing a generic module for Stochastic Gradient Descent. That takes arguments: training dataset, loss(x,y), dw(x,y) - per sample loss and per sample gradient change.

Now, for the convergence criteria, I have thought of :-

a) Checking loss function after every 10% of the dataset.size, averaged over some window

b) Checking the norm of the differences between weight vector, after every 10-20% of dataset size..

c) Stabilization of error on the training set.

d) Change in the sign of the gradient (again, checked after every fixed intervals) -

I have noticed that these checks (precision of check etc.) depends on other stuff also, like step size, learning rate.. and the effect can vary from one training problem to another.

I can't seem to make up mind on, what should be the generic stopping criterion, regardless of the training set, fx,df/dw thrown at the SGD module. What do you guys do?

Also, for (d), what would be the meaning of "change in sign" for a n-dimensional vector?

Edit:formatting

submitted by akshayxyz
[link] [8 comments]

Viewing all articles
Browse latest Browse all 63752

Trending Articles