So I am seeing the following in my experiment:
As I increase the number of samples in the training set size, the performance reported after training decreases a lot.
But in the test set, we see an equivalent INCREASE.
I'm not so sure why this is. Here are my ideas: - As we increase the size of the training set (from 1000 to 2000, 2000 to 3000 etc), the classifier has more data to find the optimal line of separation. - So whilst more errors are made during training, when we generalize to the independent test set, we see an equivalent increase in classification performance.
I think that's right, but how the heck do I say these things using technical terminology?
[link][6 comments]