I posted this on the Kernel Machines forum, but I thought I would try my luck here, too:
In short, I am wondering how much confidence to have in the results of nested cross-validation after noticing some weird results.
I have a small amount of data (~115 examples), so I am using nested, leave-one-out cross-validation to come up with an estimate of accuracy and to do parameter selection for the RBF kernel.
Now, I have developed a set of 128 different ways to generate feature vectors, and I'd like to pick the one that will give me the best performance. To do so, I test each of the 128 different methods using the nested cross-validation approach.
The results are a bit puzzling. The majority of the feature generators produce accuracy that is around chance, or reasonably close to it. My best results have accuracy of around 70%, which is pretty good for the data being classified. A typical confusion matrix for such cases looks like:
( 36 18 ) ( 14 47 )
where the (i,j) entry is the number of examples that are actually class i but were classified as class j.
This is all well and good. My problem is that there are some choices of features which produce accuracy near 20%. For example, this is a confusion matrix from one of these cases:
( 1 53 ) ( 42 19 )
This poses a problem, because I could just use this choice of features and flip the classification and get ~83% accuracy!
So this worries me. Is it just that the classifier is performing at chance, and that I have so few examples that performing at 20% is just bad luck? This would also mean that my "good" results of ~70% accuracy may also be due to luck. Or is there another explanation, perhaps, for obtaining a confusion matrix that is almost entirely off-diagonal?
Also: is there a better measure than accuracy to judge the performance of the classifier? I am currently using the parameters which give the best accuracy in classifying the training set in cross-validation, but it occurs to me that there may be other, better measures (ROC curve, etc...).
Thanks!
[link] [8 comments]