[Help] Dealing with high-variable, (relatively) low-observation data

Apologies if this is inappropiate, but I'm fairly new to ML and having a bit of trouble finding resources for this particular problem.

I have 30 observations in 2 classes (15 in each). Each observation has several thousand variables (this could be reduced in a somewhat hand-wavy way, but I'd rather not). All variables are continuous; some are normally distributed and some aren't; some are most likely redundant; some are highly informative and others aren't/ are misinformative. I'm using SVM with the RBF kernel (from libSVM in Matlab) to build classifiers, using leave-one-out cross validation (or leave-pair-out, removing one from each class, for tests with fewer iterations) to test the feature selection algorithm, but I'm having real trouble finding a feature selection algorithm which is at all stable across the different iterations of the LOOCV.

At first I tried ranking features in terms of their contrast-to-noise ratio and building a classifier by using the top one, then the top two, then the top n, and finding the optimal classifier out of those, but it meant that a lot of redundant information was included (possibly weighting the classifier in an unhelpful manner), and the results were poor, as well as the choice of features being very unstable- I think because small variations in CNR cause quite large changes in CNR rank. Then I tried greedy forward selection, which was better (80% sensitivity, 87% specificity), but still each classifier was picking up different features (although some were picked more frequently than others). The greedy algorithm used LOOCV within the remaining 29 variables to choose which feature should be added, so it was a sort of (LOO^2). It would be interesting to use the probability of any feature being selected to weight the final classifier, but to test this I'd need to go to a third level of LOO, which is getting absurd.

At the moment I'm trying to reduce the number of features by combining covarying variables, using PCA. However, it's my understanding that PCA doesn't really work very well with such rectangular data, and so I'd need to heuristically reduce the number of features first for it to effective. In particular, I found that the contrast:noise ratio of each PCC-transformed variable had no correlation with the latent of the PCC (even when the latent was 0). This means that PCA doesn't actually reduce the search space at all. Edit: Also, none of the features actually seem to have a very high covariance.

Have I missed some handy redundancy reduction, dimensionality reduction or other feature selection algorithm which is useful for this sort of data? Or is it crazy to be even looking at this rectangular a data set, and I should be trying to massively cut down the number of variables that I'm feeding in to any algorithm?

EDIT: Thanks for the help, guys! In the unlikely event that I can squeeze a publication out of this in the next couple of months I'll do my best to big up /r/machinelearning.

submitted by blackrat47
[link][16 comments]

[Help] Dealing with high-variable, (relatively) low-observation data

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112