For fun, I decided to tackle the MNIST digit dataset. My first ideas involved KMean clustering for feature evaluation and SVM with RBF kernel for classification. Given the nature of the dataset - almost binary images of digits (very few shades of gray), I didn't bother with normalization - not knowing at the time, this will be a huge problem. After several failures trying to improve classification with hyper parameter optimization I figured something was wrong. SVM was going nowhere - it degenerated to a constant single class function, while simple linear classifier were doing much better. I decided to remove KMeans for the moment and focus on SVM. Only after normalizing the data (subtract the mean and divide with standard deviation of each feature), did I get meaningfull results.
This was a bit puzzling to me. As I understand it, SVM with RBF should at the worst of generalizations acts as convoluted a KNearestNeigbours algorithm. The data also had a very well defined range, mostly just pixel on - pixel off. How can normalization have such a drastic affect?
TLDR; why does the MNIST digit data have to be normalized to get meaningfull classification with SVM?
[link][4 comments]