Hey /r/machinelearning--
I don't see too many [homework] posts here, so I hope I'm not in the wrong sub. If so, please point me to a better option.
so I'm working on a project in which i take in anime names and genres and if they are relevant or irrelevant I am trying to build a NaiveBayesClassifier with that and then I want to pass in genres and for it to tell me if it is relevant or irrelevant I currently have the following:
import nltk trainingdata =[({'drama': True, 'mystery': True, 'horror': True, 'psychological': True}, 'relevant'), ({'drama': True, 'fantasy': True, 'romance': True, 'adventure': True, 'science fiction': True}, 'unrelevant')] classifier = nltk.classify.naivebayes.NaiveBayesClassifier.train(trainingdata) classifier.classify({'Fantasy': True, 'Comedy': True, 'Supernatural': True}) prob_dist = classifier.prob_classify(anime) print "relevant " + str(prob_dist.prob("relevant")) print "unrelevant " + str(prob_dist.prob("unrelevant"))
I currently have :
size of training array:110 the relevant length 57 the unrelevant length 53
Some results I receive :
relevant Tantei Opera Milky Holmes TD {'Mystery': True, 'Comedy': True, 'Super': True, 'Power': True} relevant 0.518018018018 unrelevant 0.481981981982 relevant Juuou Mujin no Fafnir {'Romance': True, 'Fantasy': True, 'School': True} relevant 0.518018018018 unrelevant 0.481981981982
I was wondering if that makes sense... Since I am getting the same relevant probability for each classification it makes.. From my understanding of Naive Bayes it shouldn't be doing that...
using python, nltk Thanks!
[link][2 comments]