Let us suppose that you have a neural net with only one training example. Will the final state of the network be the same if you run backprop for many iterations with a small learning rate as it would if you ran it for one iteration with a learning rate of 1? My hypothesis is that it would not be the same, because the space is nonlinear and you are following the gradient of a sigmoid. But my friend thinks it would not be because of what a learning rate is. What do you guys think?
[link] [4 comments]