I'm new to this field, so please excuse my poor terminology & understanding.
For classification, why would some deep learning algorithm using autoencoders/RBMs, etc be advantageous over something simple, like k-means clustering? I know k-means requires knowing the value of k a priori, but that seems simpler than figuring out the vast array of parameters I need for my deep approaches (weight decay, sparsity penalty, number of hidden units, number of hidden layers, etc). I understand the advantages of deep architectures over shallow ones, but what is an advantage deep architectures have over k-means clustering?
This article shows that k-means can achieve state-of-the-art results more efficiently than the other approaches (single-layer autoencoders, single-layer RBMs, Gaussian mixture). Can anyone help me justify using a deep architecture when a simpler (and apparently successful) method exists?
[link][23 comments]