In ordinary classification a softmax may be used, where the maximum scoring class may be chosen. In case of multilabel classification how do we know how many labels ought to be chosen, when all the output nodes will have non-zero values. Is there some standard type of layer design, analagous to softmax for single label classification, that may be used off the shelf?
[link][7 comments]