AskML: How much confidence can one have in nested cross-validation results?

I posted this on the Kernel Machines forum, but I thought I would try my luck here, too:

In short, I am wondering how much confidence to have in the results of nested cross-validation after noticing some weird results.

I have a small amount of data (~115 examples), so I am using nested, leave-one-out cross-validation to come up with an estimate of accuracy and to do parameter selection for the RBF kernel.

Now, I have developed a set of 128 different ways to generate feature vectors, and I'd like to pick the one that will give me the best performance. To do so, I test each of the 128 different methods using the nested cross-validation approach.

The results are a bit puzzling. The majority of the feature generators produce accuracy that is around chance, or reasonably close to it. My best results have accuracy of around 70%, which is pretty good for the data being classified. A typical confusion matrix for such cases looks like:

 ( 36 18 ) ( 14 47 )

where the (i,j) entry is the number of examples that are actually class i but were classified as class j.

This is all well and good. My problem is that there are some choices of features which produce accuracy near 20%. For example, this is a confusion matrix from one of these cases:

( 1 53 ) ( 42 19 )

This poses a problem, because I could just use this choice of features and flip the classification and get ~83% accuracy!

So this worries me. Is it just that the classifier is performing at chance, and that I have so few examples that performing at 20% is just bad luck? This would also mean that my "good" results of ~70% accuracy may also be due to luck. Or is there another explanation, perhaps, for obtaining a confusion matrix that is almost entirely off-diagonal?

Also: is there a better measure than accuracy to judge the performance of the classifier? I am currently using the parameters which give the best accuracy in classifying the training set in cross-validation, but it occurs to me that there may be other, better measures (ROC curve, etc...).

Thanks!

submitted by hbweb500
[link] [8 comments]

AskML: How much confidence can one have in nested cross-validation results?

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List