Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62811

Aliasing in number of trees in gradient boosting classifier?

$
0
0

Using sklearn in python, I've been playing with a data set (target is a single binary variable) using a gradient boosting classifier (GBC). I only have about 1500 data points, so to conserve data during testing, I randomly sample 1% of the data, train the model on the other 99%, and test on the 1%. I do that a few hundred times to get decent testing precision. I then do that for a range of parameters to the GBC to optimize.

I've noticed that when I test the number of trees calculated by the model, the top scores (average accuracy across trials) tend to be multiples of each other. For example, on a given run, the best number of trees may be [50, 100, 150] or [60, 120, 180]. Sometimes it's only 2 of the top 3 (e.g. [50, 80, 100]).

I have no problem with this. I'm just curious. Is my stupid brain just seeing patterns, or is there any justification within the GBC math that shows multiplying the optimal number of trees by integers will result in models with similar performance?

submitted by dire_faol
[link][comment]

Viewing all articles
Browse latest Browse all 62811

Trending Articles