Why don't sigmoid and tanh neural nets behave equivalently?

August 1, 2013, 6:15 pm

≫ Next: How can an ensemble of predictive models provide better predictions than any individual model in the ensemble?

≪ Previous: Improve your Machine Learning with this one weird trick.

A sigmoid net can emulate a tanh net of the same architecture, and vice versa. I calculated the gradient for a tanh net, and used the chain rule to find the corresponding gradient for a sigmoid net that emulated that net, and found the same exact gradient as for a sigmoid net. What am I missing?

Edit: It turns out that if learning occurs by following the gradient in the tanh net, and one observes what happens in the corresponding sigmoid net, the gradient of the sigmoid net is not followed. I guess I could calculate the tanh gradient and transform it into updates for a sigmoid net to simulate a tanh net with a sigmoid net. I couldn't find any literature on this, so I'm still suspicious I'm overlooking something.

Edit: By sigmoid function, I am referring to 1/(1 + exp(-x)).

submitted by justonium
[link][12 comments]

↧