Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62845

SVM classification of MNIST digit dataset

$
0
0

For fun, I decided to tackle the MNIST digit dataset. My first ideas involved KMean clustering for feature evaluation and SVM with RBF kernel for classification. Given the nature of the dataset - almost binary images of digits (very few shades of gray), I didn't bother with normalization - not knowing at the time, this will be a huge problem. After several failures trying to improve classification with hyper parameter optimization I figured something was wrong. SVM was going nowhere - it degenerated to a constant single class function, while simple linear classifier were doing much better. I decided to remove KMeans for the moment and focus on SVM. Only after normalizing the data (subtract the mean and divide with standard deviation of each feature), did I get meaningfull results.

This was a bit puzzling to me. As I understand it, SVM with RBF should at the worst of generalizations acts as convoluted a KNearestNeigbours algorithm. The data also had a very well defined range, mostly just pixel on - pixel off. How can normalization have such a drastic affect?

TLDR; why does the MNIST digit data have to be normalized to get meaningfull classification with SVM?

submitted by ComplexColor
[link][4 comments]

Viewing all articles
Browse latest Browse all 62845

Trending Articles