I'm doing some experiments in classification of genomic sequences using Random Forests.
I was wondering if it makes sense to remove trees, which have a high error rate in out-of-bag samples, from the set of generated trees. The reason for this is that I want to save computations in the classification step (there is a huge amount of classifications to be performed).
Does anyone have any experience in this topic? Do you think it's a good idea or will I be losing too much prediction power? Any hints for a sensible threshold to delete a tree?
[link] [19 comments]