I was wondering if someone cold refer to some literature on the robustness to wrongly labelled learning data. I currently have a problem where I have a limited amount of examples for which I have some estimate of the robustness of the label. So which algorithms are best suited for this ? I know adaboost is particularly bad for this due to the exponential loss function, and hence a Huber loss function would be more robust. But what about classification ? There the loss function tends to be deviance or cross-entropy, would one just up-weight the training examples for which you the labels are more robust ? Is there a robust-loss function you can use for classification ? Is there any consensus if NN or RF are more robust to mislabeled data ?
Many thanks for your help.
[link][8 comments]