How to use spotty labeled training data, knowing there is correlation between features and whether an example is labeled or not?

January 27, 2012, 4:25 am

≫ Next: Ask r/ML: How does the Resynthesizer (GIMP)/Content Aware Fill (Photoshop) algorithm work?

≪ Previous: Probabilistic classification algorithm?

I have a data-set with both labeled and unlabeled examples. Due to my knowledge of the domain I know that some of the examples features greatly affect whether an example is labeled or unlabeled, causing very biased label data and grave errors in prediction. How can I use this knowledge of the correlation between the features and the probability of an example having a label to reduce bias and prediction errors?

submitted by solen-skiner
[link] [6 comments]

↧