just don't know why normalized margin is maximized in adaboost,
it is certain that the obj in adaboost is minimizing exp loss, and therefore maximizing unnormalized margin. However, Schapire (in his book or video) mentioned normalized margin(between [-1,1]) is also maximized. I test adaboost (with some UCI dataset, like twonorms, etc) and in my plot of expected normalized margin Vs # of weak learners, the curve is decreasing, the final histogram of normalized margin is centered at 0.2~0.3 like a normal distribution, not as the case all the margin goes to 1.
In philosophy, I try to understand it in this way, let's say in the 1st iteration, the normalized margin is either 1 or -1, then normalized margin is maximized/minimized for correct/incorrect labeled pts, the histogram has two extremes. As long as you keen on training, more weak learners come in, so they are giving you correct labels and wrong labels, increase or reduce the normalized margin a little bit, the hist of margins seems to be smoothed. In extreme, if you run the algo long enough, weak learners' err rate are very close to 0.5. for a particular data point , perhaps 55% of weak learners give your correct label, 45% give you incorrect labels. when you normalized alpha with l1 norm, the margin is a little bigger than zero. alpha may be larger for the beginning weak learners, but as you have many weak learners,, its relative weight become small, its effect is ignorable. Is there anything wrong with this argument?
however, the unnormalized margin did increase, that's for sure.
[link] [comment]