So, I've been doing some researching on SVMs and I have to admit the idea behind it is fascinating. The fact that you can get an answer to the similarity of two objects in some infinite dimensional feature space without ever having to go there is really mind-blowing. However, not just the choice of a kernel function that is used but the entire rationale behind using kernel functions eludes me -- they're like a magical component in SVMs.
I don't understand how kernel functions are discovered and why some are preferred. And of all the kernel functions that can exist, how does one go about proving rigorously that their own kernel function is the best kernel function to use for a specific data set.
Anyone care to explain or direct me to relevant literature? TIA
[link][7 comments]