So here's the idea. You have a vanilla recurrent neural net (RNN) that does a recurrent feedforward pass at each time step in the sequence it is reading. Then you pass it's hidden states as data to a recurrent neural net that does a feedforward pass on every other time step, and train this RNN to help fix the errors in the original one. If one continues to stack slower and slower RNN's like this, it's possible for information to flow through N time steps in only about log N feedforward passes, rather than N feedforward passes as in a vanilla RNN. I implemented this and found out that it works, but I can't find any literature on it.
[link][5 comments]