I'm putting together a supervised classifier, where some of the features may be absent for a subset of the data. And by absent, those features do not exists, as opposed to missing.
I was wondering if anyone had worked on building a classifier which had the same issue, what approaches they took, what was/was not successful, and any caveats to watch out for.
As a side note, I was doing some research today and ran across this paper published 6 years ago from Gal Chechik and others: http://jmlr.csail.mit.edu/papers/volume9/chechik08a/chechik08a.pdf I was thinking along the same lines of classification on a subspace, and was glad to see that this was validated by the above paper.
[link][1 comment]