Backprop gradient twice as big as finite difference suggests on neural network. Any advice?

December 23, 2014, 3:54 pm

≫ Next: Continuous Hierarchical Temporal Memory Part 2

≪ Previous: Deep Fried Convnets (x-post r/CompressiveSensing)

Hey guys,

I am programming a neural network from scratch in Python, and when I do a numerical check of the backpropagation gradient, I see numbers like the following example:

Gradient: -0.0375543629722 Finite difference: 0.0187042723576

The analytical gradient is around twice as big (ignoring the sign) as the finite difference.

I am using the following formula for the numerical gradient:

f(x + eps) - f(x - eps) / 2*eps

For the single unit sigmoid output layer I am using the following analytical gradient: (target - prediction) * activation_of_hidden_unit

It's based on the cross-entropy loss, and uses Stochastic Gradient Descent (maybe it has something to do with it?).

The network trains and generalizes, but I am curious and a little bit worried about this gradient problem.

If you need more details, please tell.

Thank you for your help!

submitted by ledmmaster
[link][1 comment]

↧