Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62700

AskML: How much confidence can one have in nested cross-validation results?

$
0
0

I posted this on the Kernel Machines forum, but I thought I would try my luck here, too:

In short, I am wondering how much confidence to have in the results of nested cross-validation after noticing some weird results.

I have a small amount of data (~115 examples), so I am using nested, leave-one-out cross-validation to come up with an estimate of accuracy and to do parameter selection for the RBF kernel.

Now, I have developed a set of 128 different ways to generate feature vectors, and I'd like to pick the one that will give me the best performance. To do so, I test each of the 128 different methods using the nested cross-validation approach.

The results are a bit puzzling. The majority of the feature generators produce accuracy that is around chance, or reasonably close to it. My best results have accuracy of around 70%, which is pretty good for the data being classified. A typical confusion matrix for such cases looks like:

 ( 36 18 ) ( 14 47 ) 

where the (i,j) entry is the number of examples that are actually class i but were classified as class j.

This is all well and good. My problem is that there are some choices of features which produce accuracy near 20%. For example, this is a confusion matrix from one of these cases:

( 1 53 ) ( 42 19 ) 

This poses a problem, because I could just use this choice of features and flip the classification and get ~83% accuracy!

So this worries me. Is it just that the classifier is performing at chance, and that I have so few examples that performing at 20% is just bad luck? This would also mean that my "good" results of ~70% accuracy may also be due to luck. Or is there another explanation, perhaps, for obtaining a confusion matrix that is almost entirely off-diagonal?

Also: is there a better measure than accuracy to judge the performance of the classifier? I am currently using the parameters which give the best accuracy in classifying the training set in cross-validation, but it occurs to me that there may be other, better measures (ROC curve, etc...).

Thanks!

submitted by hbweb500
[link] [8 comments]

Viewing all articles
Browse latest Browse all 62700

Trending Articles