Typo in the title: long short-term memory*
So I recently learned to represent my network as a matrix in order to speed it up, presumably because as a matrix operation it can benefit from parallelization. With a RNN though, I feel like you need to iterate over the columns of the matrix since they're dependent on the previous ones because of recurrent connections. Additionally, I am using LSTM cells whose final activation is dependent upon the activations of their gates. I feel like I've already lost a lot of speed up because of this. Can any recommend how to best implement this type of network?
[link][6 comments]