Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62811

Question on linear SVMs and curse of dimensionality

$
0
0

Hello, for a project I'm trying to analyze a binary SVM classifier on a set of images. Each image is represented by a vector of 330 values in [0,1], which sum up to 1.

(I won't explain what those features represent: it would be useless since they lack a clear meaning. Actually, my goal is to try to explain what does the classifier learn from the set)

My training set comprises 1500 training samples (hence with a set of corresponding binary labels), with over 5000 test samples.

As you can see, the dimensionality of the problem is rather high: the data matrix has 1500 rows and 330 columns. Still, I am able to train a linear SVM on 4/5th of the training set, and achieve over 95% accuracy on the remaining 1/5th part of the training set.

I am using LIBLINEAR, with L2 regularization and L2 loss function.

I also made sure to not overfit in any way: inside the 4/5ths I perform feature transformation (z-score normalization) and SVM calibration with an additional 5-fold CV.

What's more interesting is that the same SVM still achieves around 88% of accuracy over the entire test set, which is almost four times as big as the full training set. Results are very robust to choice of cross-validation set.

Why does it perform so well, despite the curse of dimensionality? Is it a general (unexpected?) characteristic of SVMs? Also, do I need to do some sort of dimensionality reduction? (bear in mind that it is impossible to put any meaningful probabilistic structure on the features, hence no LDA, QDA, etc.)

My ansatz is that the classes are truly well-separated, both in the training and in the test set. Still, marginal feature distributions are almost overlapped between classes (although I know that this happens easily, even on R2), and no feature is particularly dominant over the others, judging by the SVM hyperplane coefficients. However, I would like to rule out the dimensionality effect, to be sure of having obtained a good classifier.

Thank you!

submitted by Er4zor
[link][8 comments]

Viewing all articles
Browse latest Browse all 62811

Trending Articles