I've always thought the Softmax layer would work fine for mutually exclusive classes (e.g. colors: red, green, blue) and for the opposite (e.g. instruments on a song: vocals, guitar, bass, drums).
However, Stanford's Deep Learning tutorial disagrees and says you should use K Binary Classifiers for the second case: http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression#Softmax_Regression_vs._k_Binary_Classifiers
Why can't you use Softmax for this? My idea was the following: If you have two possible (inclusive) classes, you could use the four following outputs for training and testing [0, 0], [0, 1], [1, 0] and [1, 1]. This could be generalized for more output classes.
What is wrong with my logic?
[link][comment]