I got an assignment to implement a classifier that will categorize documents in quite large corpus. I decided to go with SVMs with linear kernel, achieved good results, moved on to theoretical questions and got stuck on this one...
I admit that my statistical knowledge has gotten a little bit rusty, so... halp! I'm not looking for complete solutions, any suggestions would be much appreciated.
[link] [5 comments]