I am using python scikit-learn and want to classify categorical data that appear like this:
feature 1, feature 2, feature3, feature 4 =animal type
Right now I am thinking about using support vector machines. There is one problem that is bothering me:
First if you have a data point which has results not in the training set such as:
training set =[cat, dog, fish, frog]
datapoint=[lion]
How would I deal with these points? My guess is you can either filter the points and you would also be filtering out false negatives. Or attempt to add everything that does not fit in such as add an category 'not identified' into the dataset.
My second problem is how to read the probabilities
Is there an alternative better method to do this that can handle data not in the training set?
[link][3 comments]