Hi,
One of the often cited issues in RNN training is the vanishing gradient problem (Y.Bengio et al., S.Hochreiter, S.Hochreiter et al., R.Pascanu et al.).
However, I came across several papers by Anton Maximilian Schaefer, Steffen Udluft and Hans-Georg Zimmermann (e.g., here) in which it is claimed that the problem doesn't exist even in a simple RNN, if shared weights are used.
So, which one is true - does the vanishing gradient problem exist or not?
Thx,
D
[link][1 comment]