I've read a few papers on deep learning, rbms, etc - and the concepts / math / implementation all make sense to me. However, when reading papers around any example application / implementation I always find that they describe some network structure (how many hidden layers to use, size of hidden layers, distributions, connectivity, pooling, etc) and I rarely end up with a clear idea of why this network structure was chosen. I'm wondering if anyone can give me a basic rundown on the thought process that goes on behind choosing network structure.
Why do different models end up being chosen with a certain number of layers / node count?
In convolutional + pooling setups, how do people decide on block sizes/activation regions?
Is there some idea of what general function each layer should perform going in?
Is the primary driver of these choices theory based or is it more driven by tinkering and seeing what tends to work?
[link][3 comments]