Quantcast
Viewing all articles
Browse latest Browse all 63107

Questions about ensembles and nearest neighbor.

I have been reading up on ensembles and how randomized forests, extremely randomized trees, etc. can be used for classification, regression, and manifold estimation (via this Microsoft technical report).

My basic understanding of high-accuracy classification methods using ensembles involves randomly selecting multiple subsets of features, constructing a decision tree around these subsets, and then applying a voting scheme to select the class label. I understand that choosing the subsets, constructing the trees, etc. involves a lot of details I am glossing over, but I am just trying to get to the point.

My question is, what about doing a nearest neighbor search? The voting scheme does not seem to make sense in the nearest neighbor context. Do you simply select the randomized subsets of features and then rank each data point by its averaged position across all subsets?

For example, consider a simple nearest neighbor search problem. I have a database of D data points. Then query Q comes in. I want to find the data points in D that are closest to Q. How might I go about framing this problem using ensembles?

submitted by zionsrogue
[link] [8 comments]

Viewing all articles
Browse latest Browse all 63107

Trending Articles