Hello!
I have been independently studying CNNs and have been a little frustrated because I am trying to implement my own CNN and a lot of literature is a bit vague on the details of their architecture. This required me to go back to the source and read some of LeCun's papers from 1989 and 1990.
When he presents his Net-5 he describes the idea of weight sharing and downsampling. However, not until a later paper in 1990 does he describe it again and then says that it is equivalent to a convolution with a small kernel. The paper states this with no justification.
Maybe this is obvious for some people, but I know very little about mathematical convolutions, and do not understand the intuition behind them. Thus I don't understand the relationship between them and weight sharing + downsampling.
Can anyone help explain these concepts to me? Or point me to literature that does make this connection?
Thanks so much!
[link][4 comments]