Classifying categorical data that is not in the training set.

January 28, 2014, 9:12 am

≫ Next: Best intro to ML books?

≪ Previous: Not understanding something fundamental about the Metropolis-Hastings algorithm

I am using python scikit-learn and want to classify categorical data that appear like this:

feature 1, feature 2, feature3, feature 4 =animal type

Right now I am thinking about using support vector machines. There is one problem that is bothering me:

First if you have a data point which has results not in the training set such as:

training set =[cat, dog, fish, frog]

datapoint=[lion]

How would I deal with these points? My guess is you can either filter the points and you would also be filtering out false negatives. Or attempt to add everything that does not fit in such as add an category 'not identified' into the dataset.

My second problem is how to read the probabilities

Is there an alternative better method to do this that can handle data not in the training set?

submitted by chchan
[link][3 comments]

↧