I know that NP is optimal but as I understand it, that's with respect to the underlying density function and not some sample drawn from it. If there is no model, than doing NP apparently corresponds to a something like a brute force search trying all combinations of the parameters (which implies imposing some discretization on continuous ones) and the finding the best best pD for each pFA. This, in my mind, raises two questions
1) Given some finite data set it may very well be that there may be pFA which are unique though many may be very close. I imagine that this is handled by binning (e.g. all pFAs rounded to the nearest 0.05 - like a histogram) Right?
2) Can NP overfit? I know it's optimal for the exact underlying densities but how does sampling affect that? If what I wrote above is correct, than the number of combinations of parameters can be ungodly (10 levels of 10 parameters ==> 1010). It's not doing a fitting in the standard machine learning paradigm sense but I would think that as the parameter discretization increases the finer mesh will yield a higher fractional error per bin(like shot noise). In any event, I have no idea how to compare this to a standard classifier (be it boosted trees, svm, etc)
[link] [comment]