So there have been kernels engineered for specific tasks that rival deep nets (e.g., http://arxiv.org/abs/1406.3332) which makes sense, since they do what is known to work well.
But then this paper just came out today on arxiv which scales up kernel methods:
http://arxiv.org/abs/1407.5599
The authors use RBF kernels for their experiments and seem to achieve performance comparable to deep nets with minimal preprocessing (some normalization or PCA) on multiple standard datasets (e.g., CIFAR-10, MNIST, ImageNet)
Makes me wonder if RBF kernels were good all along... Do we really need to learn deep feature representations?
[link][3 comments]