So I've got 5 classes, and 4 of them separate well in pretty tight clusters when you PCA plot the samples (only 2 dimensions are significant). However, the 5th sits in the middle of all of them and is quite a bit more spread out, basically spanning the space in the middle of the other 4 clusters with a small amount of overlap with each. When I do my classification (using the original features) using random forests I'm usually calling the samples in this 5th class as one of the other classes. Does anyone have any tips as to how to increase performance?
[link][4 comments]