I've run a bunch of text classification experiments. In these experiments I varied the number of classes involved. I've grouped them into experiments of 2, 4, and 8 classes.
I'm wondering what the best metric is to determine how well my approach scales to more classes. Does it even make sense to try to compare the experiments?
In a sense I'm looking for something similar to Big O analysis for algorithm performance as a factor of input size.
Thanks.
[link][2 comments]