I'm trying to implement numerical gradient checking as explained here to verify my implementation of the cost function in another exercise. I haven't changed much of the code describing the cost/gradient function of the neural network from what I used on the linked exercise (which passed the test comfortably).
I'm now experimenting with a three layer vanilla network - 50~ inputs, 25~ "hidden nodes", 1 output node, all of them using a tanh activation function. I had to modify the previous code very slightly to from a sparse autoencoder to a vanilla neural network. I'm almost certain that the implementation is correct, except that it doesn't seem to pass the numerical gradient test very well.
So my question is, are there limits to this method of checking an implementation of cost/gradient function numerically? I've spent more than 15 hours trying to debug, but haven't succeeded so far.
[link] [6 comments]